Added a test.
Wed, Sep 11
Moved the vmin range metadata into a separate patch.
Updated along with the patches it depends on. In particular, D67159 now needs a list of valid uses of the `__clang_arm_mve_alias`, so this patch now has to have an extra piece of Tablegen backend that generates that list.
New version which renames the attribute to be MVE-specific, and locks it down as requested.
Addressed many review comments.
Mon, Sep 9
Reworked the patch to reverse the multiclass nesting order, avoiding the need to fake a Tablegen if using foreach.
Thu, Sep 5
Come to think of it, it would also not be too hard to constrain it to only be usable for a particular subset of builtins, and perhaps even only with a particular set of alias names for them. (I could easily derive all that information from the same Tablegen that arm_mve.h itself is made from.)
On the general discomfort with this attribute existing: I'd be happy to lock it down, or mark it as "not recommended" in some way, if that's any help. I don't personally intend any use of it outside a single system header file (namely arm_mve.h, which D67161 will introduce the initial version of).
Wed, Sep 4
Sorry about that – I didn't want to put the discussion of rationale in too many different places. The commit message for the followup patch D67161 discusses it a bit.
Aug 8 2019
Sam's suggestion to me for the ACLE intrinsics was that there should be an IR intrinsic that converts the i16 provided by the user into an <n x i1> for whatever n makes sense. In my unpushed (and unpolished) draft implementation there's also one that converts back again, which the ACLE intrinsics will need for the return value of vcmp. So it could be used here as well if that's useful.
Hmm. I may have underestimated the difficulty, then! My thought was that all I really needed was to arrange that if you already had a v2i1, you could use it as the predicate operand for one of the 64-bit-lane instructions and not have instruction selection fail, and perhaps if you already had a pair of them you could do bitwise ops between them just like you can with all the others.
Aug 7 2019
Jul 23 2019
Jul 19 2019
Jul 17 2019
So it sounds as if the most immediate problem is that there now _isn't_ an architecture you can specify on clang's command line that permits the union of all the instructions you plan to use?
Hmmm. This surely can't be the first time a case like this has come up. What's the usual solution in other similar situations, when you want to include code for mutually incompatible architectures in the same object because you're going to test at run time which one to execute?
Jul 11 2019
Jul 10 2019
Jul 8 2019
Jul 4 2019
Jul 2 2019
Jul 1 2019
Revised patch is intended to apply after D63938 rather than before.
Revised patch fixes those two target triples in the tests, and makes the setAllExpand calls for f32 and f64 conditional on different things. Also, to make that less cumbersome, I've moved a few re-enabling setOperationAction calls into setAllExpand itself which otherwise had to be run after every single call.
I can't shed as much light as you might hope, I'm afraid, but in D62998 my intention was not to make -mcpu=anything win over -mfpu=anything. It was to make an explicit request to enable a feature win over an implicit request to disable it. It so happened in my example that the explicit request was in the -mcpu option.
Apparently it would: trying it that way, it does look simpler, and as far as I can tell it doesn't change the output of any currently checked-in Tablegen input.
Jun 28 2019
Sorry about that. rL364635 should fix it.
Jun 27 2019
Changed my mind about the opt-in system: instead of using a subclass of OperandWithDefaultOps, I've switched to using a flag field inside the existing class.
Jun 26 2019
Jun 25 2019
Rebased this patch to current trunk, and also fixed a test failure by adding arm_aapcs_vfpcc to the test functions that use MVE vector types (since we can't support passing vector types in GPRs until we get all the operations like build_vector and insert_element fully supported).
Revised this patch to work properly with all the other MVE-related changes we've been making.
Addressed all review comments.
Addressed all review comments, I think.
Jun 24 2019
I committed this this morning as rL364172, but accidentally left off the Phabricator footer (sorry). Now closing.
Minor revisions to this patch: changed a couple of legacy t2rGPR into rGPR (after the former was withdrawn during code review of D63650). Also added a small knock-on fix in checkTargetMatchPredicate, preventing an assertion failure in a check that was specific to rGPR.
Reworked the remaining loads and stores to address review comments, implement consistent naming of instruction ids, and tidy up the Tablegen so that it's hopefully halfway readable.
Updated instruction spellings in line with intended consistent MVE practice, and also reworked the WLSTP/DLSTP and LETP/LCTP definitions to remove pointless !if from the base classes. (In particular, the instructions that don't have a label field in the encoding now don't have one in their Tablegen defs either.)
Jun 21 2019
Addressed all review comments.
I've decided this patch is too large to manage all in one go. Also, the interleaving family of loads (VLD20 and friends) share essentially no infrastructure with the VLDR family, so that seems like a natural place to split the patch in two.
Revised the VIDUP immediate operand handling so as to draw a distinction between the general concept 'power of 2 which is encoded as a left-shift count in the instruction', and the specific case used in VIDUP which takes a fixed range of inputs and has a custom DiagnosticString as you suggested that explains what it's used for.
Addressed review comment about Mnemonic.startswith("vmul"), and also updated this patch to use the existing NEON complex rotation operands (with matching change in the test to expect the new nicer error messages).