- User Since
- Jan 26 2016, 2:17 AM (103 w, 2 d)
Thanks for pointing this out and this alternative approach to adjust the calling convention lowering looks more robust. What I mean by that is:
- although that approach does not come for free and involves creating a custom CCState and implementation,
- it avoids custom lowering of the loads/stores, and the few other corner cases that needed changing. As we don't this anymore, we are not risking having missed another corner case.
So I will start exploring the CCState approach.
Tue, Jan 16
This fixes most issues, now I am working on the only remaining failure
fp16-promote.ll, which is missing an upconvert.
Thu, Jan 11
Wed, Jan 10
Mon, Jan 8
Thanks for working on this!
Some comments inline.
Thanks, looks good to me. Just a few nits inlined, no need for another review.
Thu, Dec 21
Looks reasonable to me.
Dec 7 2017
Dec 6 2017
Dec 4 2017
Nov 16 2017
Looks good to me.
Nov 15 2017
Hi Sam, that's a very nice optimisation, but I think we need more tests. The "variables" that we have here are the values types (load byte, load half, load word), and the different possible offsets. Guessing a bit here, but that must be something like 3 * 3(?) = 9 tests.
And another question is how this interacts with -munaligned-access and -mno-unaligned-access?
Nov 7 2017
Nov 2 2017
Many thanks for the reviews and suggestions! Comments addressed.
Oct 31 2017
Addressed the comments about portability, and added a note that Float16 is available in both C and C++ mode.
Oct 16 2017
Oct 13 2017
Oct 12 2017
Thanks for reviewing/checking. The regexp class implements both BRE and ERE, but it defaults to using ERE. Thus FileCheck defaults to ERE. As the mode is not exposed to the user (with e.g. an option), I kept it simple and now say that FileCheck supports ERE.
Oct 11 2017
I would like to add links for BRE and ERE, to get a quick syntax overview. Not sure what page would be best for that, and if we e.g. can or like to link to wikipedia (https://en.wikipedia.org/wiki/Regular_expression#POSIX_basic_and_extended) or something else.
Oct 10 2017
Oct 4 2017
Just clarifying a typo in my previous comment:
- removed the f16 type from the calling conventions except for ARM_AAPCS_VFP. It's necessary to add them elsewhere, the default behaviour is what we want.
- added a test case using the hard float abi, but without fullfp16 support.
- added some comments about the HPR register class.
Oct 3 2017
Hi Sam, many thanks for checking! Cheers, Sjoerd.
Oct 2 2017
Added some more.
Realised that I am not testing hard float abi without fullfp16 support. And now see that code generation is not really optimal: it is generating eabi calls for the fp16 <-> fp conversions, whereas it should use the normal convert instructions. Fixing this.
Sep 29 2017
Looks good to me. Thanks.
Sep 27 2017
Sep 26 2017
Looks good to me (sorry I missed this in the other/previous review; I looked at it and thought it was alright...)
Sep 22 2017
looks reasonable to me
The context of this work was to recognise loop induction statements in Machine Instructions/Loops and to allow calculation of loop iteration counts. And isAdd provides more semantic information about a MI and helps recognising induction variables. Thus, this was the ground work for D27193, which was the next step to recognise loop induction. This patch was abandoned because the Quentin's comment was that the analysis was too ad-hoc and missing some formalism. The reason this is used only in the Hexagon backend, is because I refactored it and took it from the Hexagon backend. The Hexagon backend has a lot of MachineLoop/MI analysis and other goodness (sw pipeliner) that is mostly target independent and could be made available, but a few target specific hooks are required to do this.
Now, I really would like to pick up this work again because I still think it will be valuable, but need to rethink strategy after abandoning D27193. Probably that will be a some SVEC-like approach, but hopefully that can be a light-weight implementation to recognise simple loops (which is what we're doing anyway). This will probably be quite some work, so in the mean time, I really don't mind removing or moving isAdd back to the Hexagon backend. I could also dig up the exact requirements and document that if it's okay to keep it.
Sep 18 2017
Thanks for checking!
I was just replying, but yes, there are only 4 bits available to encode Vm, the other M bit is the index.
Hi, the full architecture specification is publicly available here:
Sep 15 2017
Looks reasonable to me.
Sep 13 2017
many thanks for reviewing and your help.
Fixed the typos, and added tests.
Sep 12 2017
Sep 11 2017
Sep 8 2017
Many thanks for reviewing and your help!
Sep 6 2017
I am going to commit this within a few days. That looks reasonable to me given that the comments in the last reviews were very minor (which I have of course addressed already). Also, in case of issues, I am guessing fixes and/or addition can be easily done post-commit, and if not a revert is cheap.
Aug 30 2017
Hi Sam, many thanks for refactoring this. Looks really good now! Just 2 nits inlined, but no need for another review.
Comments addressed. Thanks for reviewing.
Aug 29 2017
No changes were needed to make the conversions work, the existing logic is taking care of that, but I agree it doesn't hurt to add a few test cases. So I've added tests to both files, and cleaned up that comment.
Thanks for reviewing! I've addressed your comments: have set operation action to 'promote', added checks to test fcmp, and also added checks when fp16 is disabled.
Fixed typo in test case type declaration.
Forgot to mention that I will address the v8f16 types in a separate patch.
Aug 25 2017
Aug 24 2017
Thanks for fixing this.
Added test case.
Looks good to me, was wondering if you're also planning to add a Clang patch to test +rdm?
Aug 23 2017
Just to clarify my last comment, I will repeat the exercise I did in D36396 allowing f16 scalars, and will now start working on allowing f16 vector types. I will then also add tests for these.
I have removed the special case, and added the vector types.
For the copysign intrinsics working on these vectors, I have not yet added new test cases, because it looks like I first have to first do some work to allow f16 vectors.