Update to also cover float16 as per David's request.
Thu, May 28
Mar 9 2020
Feb 27 2020
Sep 6 2018
Apr 18 2018
Thanks, and I am going to try to get some clarity on this doc issue. But looks like it should be "ARMv7, ARMv8", as it used to be. Make sense to comment on this in the commit message, if that's what you mean.
Mar 26 2018
For what it is worth; this is the fix I would have suggested for the issue.
Mar 2 2018
Sorry, I should have caught this when we were reviewing this patch internally; this obviously leaves the lookups in:
v4_profile_ajax_getTopLevelCounters v4_profile_ajax_getCodeForFunction v4_profile_fwd2
Dec 18 2017
If this patch unconditionally defines _Float128, then I think it will conflict with the typedef for _Float128 for IEEE754 128-bit long double systems in glibc:
Nov 24 2017
So the change makes sense, but the references to Armv8.2-A look wrong to me.
I've checked the docs once again, and, indeed, they don't mention Armv8.2, so I'll remove the reference. However, judging from cursory read of https://patchwork.kernel.org/patch/9275721/ , it appears only Armv8.2 kernels would set EXECUTE_ONLY bits on the pages, kernels for Armv8. would set READ permissions on EXEC pages as well.
I'm confused by the title and rationale for this change. I don't think this has anything to do with Armv8.2-A and is instead a feature of Armv8-A.
Oct 17 2017
Sep 4 2017
Thanks for this. I don't know LNT well enough to review the code, but the top-level summary makes sense to me.
Aug 14 2017
Aug 11 2017
I did not check the implementation in detail, but this makes total sense to me. From my perspective this is a clear improvement and should go in.
Jul 11 2017
Hi Chris, this commit causes servers running from the wsgi script to fail. To reproduce:
Feb 22 2017
Having asked around: The way we define this is that the VABS instruction takes a signed integer and outputs an unsigned integer, getting around this problem.
However, I believe the output of VABS(INT_MIN) is indeed bit-identical to INT_MIN.
+ @jgreenhalgh to confirm I haven't mangled his explanation.
Jan 20 2017
The feature bits set match my (limited) understanding of the various ThunderX variants, and their implementation in GCC. From that perspective this patch looks good to me.
Jan 12 2017
Oct 18 2016
Sep 22 2016
Sep 20 2016
Aug 18 2016
Presumably this is where the "faster than an FADD" comes from. This transform is FMUL + FADD + [use of FMUL] -> FMA + FMUL + [use of FMUL].
There are other cases, such as FADD + FMUL + FMA -> FMA + FMA. Probably a better way to describe the use of enableAggressiveFMAFusion is the relative cost of FMA to FADD and FMUL.
I also have concerns here. The TargetLowering hook states:/// Return true if target always beneficiates from combining into FMA for a /// given value type. This must typically return false on targets where FMA /// takes more cycles to execute than *FADD*.
Whereas you say:In spite of what the original author intended, I observed that the extra folds are worth it if FMA is as quick *FMUL* instead.
Which is correct? Or are you using this hook in a way the hook users don't intend? The wording used is vague and I really think we need to have more detail about what property of Exynos-M1 makes this good for Exynos but not for any other microarchitecture.
Apr 26 2016
Thanks, that looks perfect now.
Apr 25 2016
It'd be good if James could have a final look, but I'm ok with the change.
Apr 18 2016
Sorry to have missed this before commit.
Apr 15 2016
It is true that GCC would be more efficient in some cases (one example would be FP constants), but we would still fit into the definition of "no constraint whatsoever" and therefore correct - which is an improvement from the current situation, where we'll simply crash on this constraint.
I agree bad codegen trumps ICE crashes, but James mentioned it "might well break use cases". I'm interested in those...
James, do you have some pointers on the expected usage of this constraint in the wild? The more the merrier!
Apr 14 2016
Decaying this to "w" or "r" would potentially pessimize code generation, and might well break use cases. The whole point of "X" is to prevent the compiler from having to reload an operand you don't actually care about using in the output assembly (a scratch, or in that blog post, a hidden dependency). As GCC isn't going to put any effort in to forcing the form of the operand, I'd expect most uses that actually try to print out an "X" constrained register to be using it as a shorthand for getting constants or labels out. There are normally more expressive constraints if that is what you need.