- User Since
- Jan 12 2017, 6:15 AM (202 w, 2 d)
Tue, Nov 10
Mon, Nov 9
Fri, Nov 6
Hi, this is regressing a few internal workloads (physics simulations, AArch64) by a few percent. Did you do any performance measurements for this change?
Tue, Nov 3
Reverted due to new test failing on a bunch of buildbots. I'll try again tomorrow, looks like the other pipeline tests manage to work around it.
Oct 21 2020
@SjoerdMeijer yeaahhh these pipeline tests are a bit of a pain, but nothing some big-brain sed scripting can't solve.
Added LTO pipeline test
Sep 29 2020
Thanks @tpopp, that'll unblock all of us.
Sep 28 2020
Committed as d5fd3d9b903e
Sep 22 2020
SPEC 2017 on AArch64 is neutral on the geomean. The only slight worry is omnetpp with a 1% regression, but this is balanced by a .8% improvement on mcf. Other changes are in the noise.
Sep 18 2020
I know this has already been reverted but just FYI that I've bisected a ~2% regression in SPEC2017 x264_r on AArch64 to this commit. Presumably this is due to the extra unrolling / cost modelling issue already mentioned?
Sep 17 2020
Fix for when there is no fp16 faddp + testing
LGTM, thanks for fixing this! Could you wait a day or two before committing to allow others to comment?
Sep 16 2020
Extend to f16, f32, f64 and i64
Rework to match faddp in AArch64 ISel lowering
Thanks for the feedback. I agree that ideally we'd be generating reduction intrinsics in IR and matching that in the backends. I don't think the pairwise add can be represented with the current intrinsics though: we'd need a <2 x float> variant, or a predicated version of the <4 x float> intrinsic to do this for strict FP math, I believe.
Sep 8 2020
Thanks @spatel . You're right that we miss that pattern, but, so does x86 currently it seems (I don't read x86 very well so I might be wrong). Using your faddp example:
Sep 7 2020
Jul 23 2020
Are you sure you can include config.h in an installed header file? AFAICT, config.h isn't installed, but llvm-config.h is.
Jul 13 2020
Jul 10 2020
Updates to address feedback, in particular:
Jul 6 2020
Jul 3 2020
Split out NFC rename
Jun 25 2020
Now with test changes
Ah, I missed the test changes this time round. Incoming.
May 28 2020
May 26 2020
May 7 2020
I'm running SPEC CPU intrate for this patch as well as this patch in combination with D78880.
May 6 2020
Do you think D68911 has a good chance of helping here? I can do a quick test run (quicker than finding a good reproducer) to see if improves.
Hi, we're seeing a small (1.0%) regression in omnetpp_r in SPEC INT 2017 on AArch64 with LTO enabled that bisects to this patch. I should be able to reduce omnetpp_r to a small IR example that shows the changed AArch64 codegen, if that's useful. A revert is probably not necessary if all we need is an additional pattern or two in the AArch64 backend.
Apr 17 2020
I can report that in our testing on SPEC 2017, this pass fixes the regression to mcf introduced with D76483.
Mar 31 2020
Thanks, much appreciated!
[...] It looks like SCEV can't see "through" the freeze node. [...]
I see. This link might be helpful: https://reviews.llvm.org/D70623
Mar 30 2020
Hi, I can confirm that D76010 unfortunately doesn't fix the regression.
Mar 27 2020
Hi, we're seeing a performance drop of 1.3% on SPEC 2017 mcf_r (compiled with LTO enabled) on AArch64 that bisects down to this patch. I'm testing whether D76010 happens to fix this regression (I'll comment when I get the results), but if not then this might need some investigation to see what's going on.
Jan 29 2020
Jan 28 2020
Address Eli's feedback; clarified commit message.
Jan 17 2020
Jan 16 2020
Fix tests, thanks rnk
Jan 15 2020
Jan 14 2020
Dec 23 2019
Dec 20 2019
Dec 13 2019
Dec 5 2019
Great, thanks for confirming.