- User Since
- Oct 21 2016, 1:19 AM (125 w, 6 d)
Thu, Mar 14
@nikic thanks for pointing me to that discussion! I clearly misread the LangRef for this change :)
I agree it makes more sense to change these intrinsics and to make its accumulator argument always relevant (regardless of what flags are set) and AutoUpgrading older IR. Given that I'm currently working on this, I'd be happy to move this forward with patches and a proposal/discussion on the mailing list to change the experimental reduction intrinsics. @aemerson you expressed an intention to work on it later this year, do you have any objection to me moving forward with this now?
Updated the tests to use a scalar accumulator value (instead of a vector accumulator value, which was wrong, but seemed to be completely ignored by SelectionDAGBuilder).
Wed, Mar 13
This patch now matches ISD::VECREDUCE_FADD directly, and shows the diff with test/CodeGen/AArch64/vecreduce-fadd.ll.
Tue, Mar 12
Mon, Mar 11
Thanks for updating your patch @sanjoy ! LGTM (with a side-note on a follow up patch to take more specific flags in expandReductions).
You may want to update the commit message before you commit :)
Thanks for making these changes, I think the patch looks good now!
Wed, Mar 6
Thanks for these changes to your patch @sanjoy! The patch is looking in good shape, just a few remaining nits mostly.
Tue, Feb 26
Thanks for this patch @nikic! Please find some comments inline.
Wed, Feb 20
Dec 31 2018
Nov 27 2018
The patch looks fine to me (with nit on the const_cast)
Nov 26 2018
- resolved editorial comments.
Just to double check before committing, @aaron.ballman are you happy with the tests?
Thanks for making this change, I think having the interface more generic makes sense. Please find some comments inlined.
Nov 19 2018
Thanks all for the suggestions and comments! I've updated the patch with a better description of the attribute's behaviour (thanks @rjmccall for the starting point!) and added Sema tests.
Nov 13 2018
- Removed _aarch64_vector_pcs and __aarch64_vector_pcs keywords in favour of supporting only __attribute__(aarch64_vector_pcs)).
Nov 12 2018
Sep 6 2018
Sep 4 2018
- Calculate CSStackSize by accumulating reg-size for each saved register.
- Only sets PairedReg in SavedRegs when it is not AArch64::NoRegister.
- Removed trailing whitespace from test.
Sep 3 2018
The code in hasLoadFromStackSlot filters out MMO's that have a pseudo source value of type Stack. If that function populates an array of only those MMOs, I think its safe to do a cast<FixedStackPseudoSourceValue>(mmo->getPseudoValue()) and get rid of the FrameAccess struct.
Aug 31 2018
- Some refactoring to maintain CSStackSize.
- Migrated .ll test to a .mir test.
Fixed issue with incorrect size of MachineMemOperand.
Aug 30 2018
Aug 17 2018
Added negative tests and moved the tests to their own files test/MC/AArch64/SVE (system-regs.s and system-regs-diagnostics.s).
These system registers were missing from my previous SVE MC patches.
Jul 30 2018
- Updated to reflect changes in D49592 to use TSFlags instead of separate table for annotating instructions as destructive operations.
- Added negative tests for new SVE instructions that were added since uploading the previous patch.
Updated the patch to use TSFlags instead of a separate look-up table to annotate instructions as being destructive operations.
Jul 27 2018
Hi Sjoerd, thanks for your feedback! We actually use the TSFlags in our downstream assembler, but had a few reasons to re-implement it with a GenericTable instead:
- The TSFlags are used for all instructions, taking up global bits for only a relatively small selection of instructions, where these bits make little sense to non-movprfxable instructions. To better utilize the (potentially) cheap move/zeroing capabilities of MOVPRFX, we'll need more fields/bits to describe the kind of operation e.g. whether it is unary, binary, commutative, whether it has a reversed operation (e.g. sub and subr), which in our downstream compiler takes a total of 9 bits. With this in mind, I thought it made more sense to describe this in a separate table instead.
- The table is not queried that often (only when encountered together with a MOVPRFX), so there is little runtime overhead.
Jul 20 2018
It is probably worth noting that the positive and negative tests to explicitly test that instructions (and/or specific forms of those instructions) are or are not movprfx-able, are auto-generated using a separate tool (so my apologies for the large diff :)). The movprfx-diagnostics.s is worth pointing out as testing the specific rules of the MOVPRFX instruction.
Jul 4 2018
Jul 3 2018
Jul 2 2018
Jun 18 2018
Thanks @fhahn I've added a comment to regsEqual describing what it can be used for.
Jun 15 2018
Jun 14 2018
Jun 6 2018
Merged k_ExactFPImm with k_FPImm.
Jun 5 2018
Hi @olista01 , I just tried merging this change with k_FPImm and one of the problems I'm running into is with the '#0.0' and InstAliases.
For example in SVEInstrFormats.td, class sve_int_dup_imm:
InstAlias<"fmov $Zd, #0.0", (!cast<Instruction>(NAME # _S) ZPR32:$Zd, 0, 0), 1>;
The k_FPImm case works a little different.. Instead of it storing the APFloat value, it stores the (AArch64_AM) encoded FP imm value. If the value cannot be encoded, the parser throws an error. Also not all values can be encoded, such as '0.0', for which the parser (for k_FPImm) puts in a "#0.0" string literal into the parsed operands list. The two cases can be merged, but I think it may be better to do that in a separate patch as this will require some refactoring and changes to match e.g. the #0.0 case in existing FP instructions.
Jun 4 2018
Jun 3 2018
Jun 1 2018
Removed some redundant test-cases for fmov alias that are already tested by test/MC/AArch64/SVE/fdup.s
I think the difference with other immediates that have a limited set, is that these immediates have a non-trivial parsing/decoding/printing. Rather than just allowing a given range of integer values the string is first parsed as FP value and then encoded as integer value between 0-255. And the opposite for the decode. I think it makes sense to test these cases individually, although I can see how doing this for the FMOV alias as well could be a bit excessive, so I'll reduce those.
Since the immediate encoding only allows for a limited set of 256 different floating-point immediates, I thought it would make sense to test each of them separately to make sure they are all correctly parsed, assembled and printed, since basically these are all the edge-cases. The negative tests are much smaller and test a few floating point values that cannot be encoded. I can easily reduce the tests, but I think these tests are valuable. What do you think?