User Details
- User Since
- Feb 10 2016, 9:51 AM (333 w, 2 d)
Wed, Jun 22
Thank you for adding this information!!
Tue, Jun 21
Latest diff looks good to me.
LGTM.
May 24 2022
Changes LGTM with inline nits.
@nikic I have not verified compile time impact with the latest diff.
May 17 2022
May 16 2022
Following up on the regression I was seeing with https://reviews.llvm.org/D114171, the only meaningful IR change seems to be:
Before the patch (after slp):
%65 = shufflevector <2 x float> %9, <2 x float> poison, <4 x i32> <i32 0, i32 1, i32 undef, i32 undef> %66 = insertelement <4 x float> %65, float %16, i32 2 %67 = insertelement <4 x float> %66, float %15, i32 3 %68 = call <4 x float> @llvm.fabs.v4f32(<4 x float> %67) %69 = fcmp oeq <4 x float> %68, <float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000, float 0x7FF0000000000000> %70 = freeze <4 x i1> %69 %71 = call i1 @llvm.vector.reduce.or.v4i1(<4 x i1> %70)
May 12 2022
May 10 2022
May 9 2022
May 8 2022
May 5 2022
We currently root-caused a 20% regression in eigen complex to this patch and it is blocking our compiler release.
I need some time to do additional testing, given there are at least a few patches on top of this. So far I see a crash fix, two NFCs and two additional changes that may also affect performance (https://reviews.llvm.org/D114171 and https://reviews.llvm.org/D115750).
Can you please hold off on any additional changes while we test further, or include them here for review so I can check if they resolve the regressions seen so far?
May 3 2022
I'm seeing some fairly big regressions with this patch, specifically on Rome (AMD) architecture.
A couple of examples that are public in the test suite: SingleSource/Benchmarks/Shootout: for sieve I'm seeing a 20% performance regression in an opt build and an xfdo one, and for MicroBenchmarks/ImageProcessing/Dither 10% regression (opt, thinlto and xfdo).
I'm seeing also a couple on Skylake, opt build, in the range of 5-13 %, an example being eigen with 13% regression; this may be harder to track down as it's in a specific configuration but let me know if you want to reproduce this one.
Apr 25 2022
Noted, I will run some testing this week.
Thank you for the cleanup!
Maybe worth a mention that this effectively enables SimpleLoopUnswitch for the legacy PM, in case any users need this clarification; more of a nit, considering the LPM is being removed.
Apr 22 2022
As a side note, the fact the freeze instruction needs licm and simple loop unswitch interleaved, will likely affect the plan we had around splitting off the non-trivial unswitch functionality into a separate function pass (the context for that is regarding using divergence analysis: https://reviews.llvm.org/D109762)
It seems like tracking the added freeze instructions and including the functionality of hoisting them as part of the "new" non-trivial SLU pass will resolve this, but I'd like to hear folks feedback on this.
+ 1 to please clarify in the test why this is necessary.
Without the context, the question will be: why not generate the IR after LICM and only test SLU? Clarify the test needs the sequence SLU on inner loop, licm to hoist freeze condition and then SLU on outer loop can be done.
Apr 7 2022
Added nits, but happy if this goes in as is too.
This cleanup is beautiful!
Thank you!!
Apr 6 2022
+1 to the additional cleanup.
Apr 5 2022
Thank you!
Mar 29 2022
Looks good to me as well.
Mar 28 2022
Mar 24 2022
lgtm
Mar 22 2022
I'm seeing no meaningful difference in performance from this as is. Let's see if there are other reports and we can increase the limit if this is the case.
Mar 17 2022
As discussed offline, the potential issue with this change is whether we encounter a pathological case in the "opposite direction".
Update based on comments.
Thanks for the suggestions, updated a couple more places.
LGTM with a couple nits.
Mar 15 2022
Sent out https://reviews.llvm.org/D121740.
LGTM for this patch on the impact side, testing looks good.
LGTM.
Mar 14 2022
Reading the details in your post and looking at the SLPVectorizer code I think there's the potential for the pass to benefit from using MemorySSA. However I do not understand enough of the SLPVectorizer mechanics to guess if this will work out or not.
In this context, I think it makes sense to have incremental changes while keeping building and updating MSSA off by default; as you mentioned in the post, all changes can (and should) be reverted if a later decision is to pursue something else.
Mar 10 2022
Given the situation, I agree this is a reasonable way to handle it.
Thank you for sending this out!
Mar 3 2022
Mar 1 2022
Feb 25 2022
Affected code is close to or identical to the modules in: https://github.com/llvm/llvm-test-suite/tree/main/Bitcode/Regression/fft
Thank you for looking at this! Please see inline.
Feb 24 2022
@chill mentioned he did some early testing on that, so he might have some numbers.
From my side, it's too early. I'm seeing a correctness issue in the early testing that needs resolving. I'm looking into getting @chill something actionable there.
I have couple of high level comments/clarifications:
- Are all mini-transforms within GVN moving to MSSA in this case?
Is they needed MemDep before, yes.
- Is the move to MemorySSA for performance of cases not caught through MDA?
It should address at least one or both. Primary goal is remove last user of MDA and remove the analysis from trunk. The implementation for the transition should be such that it either catches more cases or provides improved performance (or both). From previous experience with switching DSE, this is possible if implemented right.
@asbirlea, a question to clarify my understanding around enabling MSSA for a pass:
- We optimistically build memorySSA for passes that unconditionally require it. This building and preservation can be costly from compile time perspective. Also, we may have possible invalidation by later function passes which do not preserve GVN, thereby having different implications for pipelines depending on where GVN is present. For example, if we end up with something like:
LPM { ... LICM <-- requires MSSA. } instcombine <-- invalidates MSSA GVNPass() <-- build MSSA again simplifyCFG <-- invalidates MSSAis arguably much worse in compile time than:
LPM { ... LICM <-- requires MSSA. } GVNPass() <-- use preserved MSSA instcombine <-- invalidates MSSA simplifyCFG
Feb 22 2022
Feb 18 2022
Thank you for starting this!
From my side, LGTM.
Feb 17 2022
lgtm
Feb 16 2022
Feb 15 2022
I think the clearing should be added at the beginning of insertDef and insertUse. A similar clearing is done on visited phis at the same two places. I didn't get a chance to test it yet though.
Feb 10 2022
lgtm.
Feb 9 2022
Feb 8 2022
Just a couple of quick comments:
- Why take out the option to DisableGVNLoadPRE?
- Can the MemDep and MSSA be mutually exclusive options? (i.e. enabling one disables the other)
Feb 4 2022
AFAICT, this patch doesn't make any difference for https://github.com/llvm/llvm-project/issues/52858.
Feb 2 2022
Reproducer with llc -O2: https://reviews.llvm.org/P8277
Jan 25 2022
Thank you for sending this fix!
It feels like a pretty big hammer, but some of the tests being modified here are exact examples of the miscompile I was seeing. I'm good with it at this point.
Any chance you could profile where most time is spent?
Quick update: I'm not seeing any major run time regressions, but there are a few unrelated issues that may make the results I'm seeing incomplete.
I'm seeing some compile-time regressions, I'd be curious to see what the updated compiler-tracker results look like.
Thank you!
Jan 20 2022
A very broad comment for a smooth transition
- the patches should be split into changes to GVN independent of the switch, e.g. a) factoring out components/methods where the differences between MSSA and MD occur and b) things that need to be added for the transition
- we should keep the MD implementation and have a flag for switching between the current one and the MSSA one being added
- once the flag flip sticks, then we'd remove the MD implementation
Jan 19 2022
Thank you for resolving all the issues raised! Could you rebase this patch on ToT? (to include the MSSA update and the move to LPM1?)
Jan 18 2022
Changes LGTM.
Nit: clang-format.
A simpler way to check all verification is now to pass VerificationLevel::Full to verifyMemorySSA(); I'm mentioning this for your local testing, in case your build doesn't have EXPENSIVE_CHECKS on already, not to include in this patch.
Dec 13 2021
Nov 16 2021
Thank you for this work!
Nov 11 2021
Allowing passes to do this is a slippery slope...
We already have the issue that it is sometimes hard to reproduce an issue with a single pass, due to the cached state in MemorySSA. This cached state is already dependent on what other passes do, a mix between queries and transforms, which may leave MemorySSA either "over-optimized" (has stored info beyond what it could deduce if built from scratch) or "under-optimized" (could deduce more). Having DSE optimize more is not that far fetched considering this.
I'm inclined to let DSE do this, only because 1) the traversals and inferences are beyond what MSSA alone does now and 2) they're "free" (i.e. DSE does them anyway).
However, I would not be ok with any pass being able to set optimized accesses.
Thank you for all the clarifications. I'm convinced this is the right way to move forward at this point.
Nov 9 2021
LGTM, but please wait to see if @nikic has additional comments.
Nov 8 2021
Thank you for the additional clarifications in the comments and descriptions!
The request changes is for the use of getID() in MemorySSA(). Other optimizations should never rely on the value of the IDs assigned by MemorySSA.
Nov 4 2021
Can you also update lib/LTO/LTOBackend.cpp to something like:
// Parse a custom AA pipeline if asked to, otherwise the default will be used. if (!Conf.AAPipeline.empty()) { AAManager AA; if (auto Err = PB.parseAAPipeline(AA, Conf.AAPipeline)) { report_fatal_error(Twine("unable to parse AA pipeline description '") + Conf.AAPipeline + "': " + toString(std::move(Err))); } // Register the AA manager first so that our version is the one used. FAM.registerPass([&] { return std::move(AA); }); }
This change's numbers look better overall. The compile-time regressions are smaller across the board and the memory gains are still there, in some cases reduced and in some cases improved. I don't think the level of noise is that high.
Overall, this change is the right approach both conceptually (only invalidate in these adaptors) and given the practical results.
Nov 3 2021
Adding a few items/questions discussed in the open meeting:
Nov 1 2021
Oct 25 2021
This broke upstream bazel: https://buildkite.com/llvm-project/upstream-bazel-rbe/builds/10376#98340937-c4c7-4bd7-997e-dd56691da9d8
Was the header file missed?