- User Since
- Jan 12 2017, 6:15 AM (196 w, 4 d)
Tue, Sep 29
Thanks @tpopp, that'll unblock all of us.
Mon, Sep 28
Committed as d5fd3d9b903e
Tue, Sep 22
SPEC 2017 on AArch64 is neutral on the geomean. The only slight worry is omnetpp with a 1% regression, but this is balanced by a .8% improvement on mcf. Other changes are in the noise.
Sep 18 2020
I know this has already been reverted but just FYI that I've bisected a ~2% regression in SPEC2017 x264_r on AArch64 to this commit. Presumably this is due to the extra unrolling / cost modelling issue already mentioned?
Sep 17 2020
Fix for when there is no fp16 faddp + testing
LGTM, thanks for fixing this! Could you wait a day or two before committing to allow others to comment?
Sep 16 2020
Extend to f16, f32, f64 and i64
Rework to match faddp in AArch64 ISel lowering
Thanks for the feedback. I agree that ideally we'd be generating reduction intrinsics in IR and matching that in the backends. I don't think the pairwise add can be represented with the current intrinsics though: we'd need a <2 x float> variant, or a predicated version of the <4 x float> intrinsic to do this for strict FP math, I believe.
Sep 8 2020
Thanks @spatel . You're right that we miss that pattern, but, so does x86 currently it seems (I don't read x86 very well so I might be wrong). Using your faddp example:
Sep 7 2020
Jul 23 2020
Are you sure you can include config.h in an installed header file? AFAICT, config.h isn't installed, but llvm-config.h is.
Jul 13 2020
Jul 10 2020
Updates to address feedback, in particular:
Jul 6 2020
Jul 3 2020
Split out NFC rename
Jun 25 2020
Now with test changes
Ah, I missed the test changes this time round. Incoming.
May 28 2020
May 26 2020
May 7 2020
I'm running SPEC CPU intrate for this patch as well as this patch in combination with D78880.
May 6 2020
Do you think D68911 has a good chance of helping here? I can do a quick test run (quicker than finding a good reproducer) to see if improves.
Hi, we're seeing a small (1.0%) regression in omnetpp_r in SPEC INT 2017 on AArch64 with LTO enabled that bisects to this patch. I should be able to reduce omnetpp_r to a small IR example that shows the changed AArch64 codegen, if that's useful. A revert is probably not necessary if all we need is an additional pattern or two in the AArch64 backend.
Apr 17 2020
I can report that in our testing on SPEC 2017, this pass fixes the regression to mcf introduced with D76483.
Mar 31 2020
Thanks, much appreciated!
[...] It looks like SCEV can't see "through" the freeze node. [...]
I see. This link might be helpful: https://reviews.llvm.org/D70623
Mar 30 2020
Hi, I can confirm that D76010 unfortunately doesn't fix the regression.
Mar 27 2020
Hi, we're seeing a performance drop of 1.3% on SPEC 2017 mcf_r (compiled with LTO enabled) on AArch64 that bisects down to this patch. I'm testing whether D76010 happens to fix this regression (I'll comment when I get the results), but if not then this might need some investigation to see what's going on.
Jan 29 2020
Jan 28 2020
Address Eli's feedback; clarified commit message.
Jan 17 2020
Jan 16 2020
Fix tests, thanks rnk
Jan 15 2020
Jan 14 2020
Dec 23 2019
Dec 20 2019
Dec 13 2019
Dec 5 2019
Great, thanks for confirming.
Dec 3 2019
Nov 29 2019
This change appears to cause an assertion failure in clang during a Chromium for Windows on Arm (AArch64). We suspect that it is also the cause of a mis-compilation when clang does not have assertions enabled, and causes a crash in some test cases. See https://crbug.com/1029385 for details.
Remove redundant TODO
Nov 28 2019
Is the plan to add the indexed variants too? To treat them in the same way.
Addressed @dmgreen's comments.
Nov 26 2019
Clarified commit message; fixed long lines.
Nov 25 2019
Added performance evaluation results
Nov 8 2019
Nov 7 2019
Enable -consider-local-interval-cost for AArch64 only instead of all targets.