- User Since
- Jan 12 2017, 6:15 AM (221 w, 5 d)
Since this is mostly just moving pre-existing code, I think it's fine to address the style issues in a separate NFC commit. I believe all comments have been addressed, but please wait a day or two before committing in case there is anything else. LGTM.
+1 on eagerly awaiting a fix. a 3% regression on astar (AArch64, LTO) bisects to @lebedev.ri 's revert: https://reviews.llvm.org/rG6270b3a1eafaba4279e021418c5a2c5a35abc002 .
Thanks! This seems to be moving in the right direction. Now that SCCPInstVisitor is separate, its declaration can move into SCCPSolver.cpp entirely.
Wed, Apr 7
Good spot! Thanks!
Thu, Apr 1
I think the issue here is that quite a number of implementation details are leaking into the header file where they don't belong because they are not part of the public interface. You're right that code using the header file doesn't have access to these by virtue of being private, but this does not make for a very readable header file, IMO.
Tue, Mar 30
Mar 8 2021
Mar 5 2021
Mar 1 2021
Add a few more tests and some further test reduction.
Feb 24 2021
Feb 18 2021
Committed as https://reviews.llvm.org/rG93d9a4c95aff . Looks like I forgot to add the "Differential Revision:" line :(
Feb 17 2021
I've adopted your suggestions, thanks!
Feb 15 2021
Feb 8 2021
This looks good to me. @fhahn are you happy with the suggested approach to tighten up the SCCPSolver class in subsequent patches, when function inlining starts using it?
Maybe I'm missing something, but this change doesn't seem to be effective anymore after the new pass manager switcheroo. Did this pass not get ported, in favour of SimpleLoopUnswitch? This now shows up as a regression relative to (what will be) LLVM 12.
Feb 4 2021
This looks like a fine first step to separate SCCPSolver from SCCP. I agree with @fhahn that the SCCPSolver class should be improved by pulling out implementation details into (separate, hidden) classes, but this is probably best reviewed in separate patches. At least it looks like the public interface of SCCPSolver is already restricted to what is needed by SCCP.
Feb 3 2021
Hey, we've run into a case where this patch causes a dead loop to appear which doesn't subsequently get removed. It's not a huge deal (we're seeing some great speed-ups from this patch too!), but a niggle that might be nice to address. Reproducer:
Jan 27 2021
As the approach here is pretty much identical to D36432, do you have any plans on the cost model for this pass? The earlier proposal seems to have been abandoned due to a lack of good heuristics to trade off between the (potentially huge) increase in code size (as well as compile time) and performance improvements (which may or may not arise due to later optimizations). Perhaps it would help to look at what trade-offs GCC makes? Specifically, it would be good to have an idea of how you would implement getSpecializationCost and getSpecializationBonus.
Jan 18 2021
Looks good to me!
Jan 15 2021
Jan 14 2021
So on SPEC 2006 this fixes astar (+9.6%) as well as shakes things up enough to "fix" h264ref (+9.0%, see D93946). Other notable changes are libquantum (+3.2%) and omnetpp (-2.8%). Geomean is +1.5%.
Turns out the failing benchmarks are due to a miscompilation with -g3 (which we add to profiled runs). The patch does seem to make that miscompilation more likely. I'll try to reduce that separately, but at least I'll have some performance numbers shortly.
Jan 12 2021
Hi folks, I'm afraid an 8% regression in h264ref (SPEC INT 2006) bisects to this patch. This in on AArch64 with -flto on a Neoverse-N1 based system, but I wouldn't be surprised if this showed up elsewhere too.
Jan 8 2021
Yep, definitely a problem on our end; I've got similar symptoms in other runs. Sorry about the noise, and I will try to get some perf data once we sort this out.
Our flags are -mcpu=native -O3 -fomit-frame-pointer -flto on a Neoverse-N1 based system, so -mcpu=neoverse-n1 should do the same thing.
Jan 7 2021
@fhahn I'm afraid I'm seeing runtime and verification errors on perlbench, gcc, gobmk, and astar in SPEC INT 2006. I'm guessing something later in the pipe really doesn't like it when certain loops aren't rotated?
Thanks, Florian. I'll give the patch a run through our benchmarking.
@pzheng I don't have perf data for Cortex-A57, but I would certainly expect to see an uplift there too.
Jan 6 2021
Dec 15 2020
Hi, our flags are "-Ofast -flto -mcpu=native -fomit-frame-pointer", where "native" is a Neoverse-N1 system. Let me know if that helps.
Dec 1 2020
Just wanted to add to @fhahn that we're seeing this too, as an even more notable 8% regression (same benchmark, different hardware). Florian's explanation matches what I'm seeing, but I hadn't had a chance to confirm what was happening.
Nov 10 2020
Nov 9 2020
Nov 6 2020
Hi, this is regressing a few internal workloads (physics simulations, AArch64) by a few percent. Did you do any performance measurements for this change?
Nov 3 2020
Reverted due to new test failing on a bunch of buildbots. I'll try again tomorrow, looks like the other pipeline tests manage to work around it.
Oct 21 2020
@SjoerdMeijer yeaahhh these pipeline tests are a bit of a pain, but nothing some big-brain sed scripting can't solve.
Added LTO pipeline test
Sep 29 2020
Thanks @tpopp, that'll unblock all of us.
Sep 28 2020
Committed as d5fd3d9b903e
Sep 22 2020
SPEC 2017 on AArch64 is neutral on the geomean. The only slight worry is omnetpp with a 1% regression, but this is balanced by a .8% improvement on mcf. Other changes are in the noise.
Sep 18 2020
I know this has already been reverted but just FYI that I've bisected a ~2% regression in SPEC2017 x264_r on AArch64 to this commit. Presumably this is due to the extra unrolling / cost modelling issue already mentioned?
Sep 17 2020
Fix for when there is no fp16 faddp + testing
LGTM, thanks for fixing this! Could you wait a day or two before committing to allow others to comment?
Sep 16 2020
Extend to f16, f32, f64 and i64
Rework to match faddp in AArch64 ISel lowering
Thanks for the feedback. I agree that ideally we'd be generating reduction intrinsics in IR and matching that in the backends. I don't think the pairwise add can be represented with the current intrinsics though: we'd need a <2 x float> variant, or a predicated version of the <4 x float> intrinsic to do this for strict FP math, I believe.
Sep 8 2020
Thanks @spatel . You're right that we miss that pattern, but, so does x86 currently it seems (I don't read x86 very well so I might be wrong). Using your faddp example:
Sep 7 2020
Jul 23 2020
Are you sure you can include config.h in an installed header file? AFAICT, config.h isn't installed, but llvm-config.h is.
Jul 13 2020
Jul 10 2020
Updates to address feedback, in particular: