- User Since
- Oct 21 2016, 1:19 AM (134 w, 4 d)
Fri, May 17
Good spot! I've updated these tests now.
- Updated more tests.
Thu, May 16
For now the answer is yes. One of our primary concerns at the moment is adding basic SVE spill/fill support and we appreciate the caveat that nothing in LLVM really supports the concept of scalable types yet, including offsets and sizes.
I think you've made some compelling reasons to try the change in layout! I'll actually try this out downstream first before updating this patch, so I can run it through our SVE testing and see if there is any impact on performance or if I run into anything unexpected.
Thanks for clarifying!
Thanks for the prod! I've sent an update to the ML.
Fri, May 10
While I don't want to hold up this patch, a name change to the type will be difficult to get through once the type is in and the terminology has settled, so it may be worth getting it right from the start. Personally I think <vscale x 16 x i8> is more explicit (and therefore easier to understand for those new to the type) than <scalable 16 x i8>. And now that we all have a better understanding and clear definition of scalable types, I wonder if <n x 16 x i8> instead of <vscale x 16 x i8> is worth considering again, since it is much easier to read in tests.
Thanks for your suggestions @efriedma!
Tue, May 7
- Added more unittests for StackOffset.
- Added unit tests for StackOffset.
- Moved around file comments in AArch64StackOffset.h and fixed typo.
Fri, May 3
We've actually experimented with various layouts and eventually chose this layout for our HPC compiler.
Added several asserts.
Thu, May 2
Tue, Apr 30
@SjoerdMeijer Yes the patch is indeed quite substantial in size :) In contrast to the original SVE MC patches though, these changes are only new TableGen instruction definitions and encoding classes, corresponding (dis)assembler tests and a few flags. For example, there aren't any code changes to e.g. implement/parse new operand types.
Apr 12 2019
LGTM. The compiler will have more opportunity to use cheaper shuffles if part of the (sub)vector is undef, so this looks like a general improvement.
Apr 11 2019
From looking at the A57 scheduler model in LLVM this seems like an improvement, with for example VUZPq taking twice as long as VUZPd. But in general I could imagine shuffles on shorter vectors to be cheaper.
It will probably also be easier to spot that one of concatenated subvectors becomes fully undef after doing the shuffling on the subvector first.
Apr 10 2019
Thanks @spatel. I'll also create a follow-up patch to use a switch statement as you suggested!
- Removed redundant checks for predicate value.
- Renamed ArgV to Cmp
We found a case where repeated calls into computeKnownBits(FromAssume) skyrocketed our build-times after we added a call to llvm.assume after vectorized loops to describe the range of a newly introduced induction variable, causing some builds to time-out.
Because the 'less then' or 'less then equal' case was way down this list in this function, it had to go through all the matching calls. Unfortunately I don't have any real numbers to back this up, because we did this work a while ago.
Apr 5 2019
Apr 4 2019
- Renamed sgpr_spill -> sgpr-spill.
- Renamed TargetStackID::SGPR_Spill -> TargetStackID::SGPRSpill.
Apr 3 2019
Thanks! Yes, you're right that exposing the target-specific names/enums in target-independent code is less desirable, although note that this happens in some other places in LLVM as well; think for example of the enum in CallingConv. I made the trade-off to use the enum for several reasons:
- By using an enum way we don't get target-specific MIR parsing of common properties.
- If we want to make it target-independent, we'll need to manually parse it as a string and we lose out on out-of-the box YAML parsing, printing, verification and diagnostics of having an enum value.
- If the value needs to be a string, it will have ugly quotes around it :)
- At the moment there seems to be only very limited use of stack-id, in trunk only the AMDGPU target as far as I know. Unless we care deeply about allowing to programatically generate a range of stack-ids at compile-time (with their meaning only known at compile-time), I don't think it is much of a limitation to have a single namespace.
Apr 2 2019
Apr 1 2019
Mar 27 2019
Closing this patch manually. I accidentally changed the last digit to be D59636 when I committed this patch :)
Mar 25 2019
- This patch now uses Optional<unsigned> for getUnscaledLdSt()
- Separate out the non-functional changes from the functional change (the added cases in AArch64InstrInfo::getMemOpInfo)
- Some more simplification of the arithmetic in isAArch64FrameOffsetLegal.
Thanks @c-rhodes, LGTM.
Mar 22 2019
(not sure why Phabricator marked these comments as 'done' since they're not done yet)
Thanks @c-rhodes, this was indeed a mismatch with the SVE spec. Minor nit: the SVE spill/fill instructions also take ZPRAny operands, you could add some tests for those instructions as well to ensure it is not limited to movprfx alone.
Mar 21 2019
Mar 14 2019
@nikic thanks for pointing me to that discussion! I clearly misread the LangRef for this change :)
I agree it makes more sense to change these intrinsics and to make its accumulator argument always relevant (regardless of what flags are set) and AutoUpgrading older IR. Given that I'm currently working on this, I'd be happy to move this forward with patches and a proposal/discussion on the mailing list to change the experimental reduction intrinsics. @aemerson you expressed an intention to work on it later this year, do you have any objection to me moving forward with this now?
Updated the tests to use a scalar accumulator value (instead of a vector accumulator value, which was wrong, but seemed to be completely ignored by SelectionDAGBuilder).
Mar 13 2019
This patch now matches ISD::VECREDUCE_FADD directly, and shows the diff with test/CodeGen/AArch64/vecreduce-fadd.ll.
Mar 12 2019
Mar 11 2019
Thanks for updating your patch @sanjoy ! LGTM (with a side-note on a follow up patch to take more specific flags in expandReductions).
You may want to update the commit message before you commit :)
Thanks for making these changes, I think the patch looks good now!
Mar 6 2019
Thanks for these changes to your patch @sanjoy! The patch is looking in good shape, just a few remaining nits mostly.
Feb 26 2019
Thanks for this patch @nikic! Please find some comments inline.
Feb 20 2019
Dec 31 2018
Nov 27 2018
The patch looks fine to me (with nit on the const_cast)
Nov 26 2018
- resolved editorial comments.
Just to double check before committing, @aaron.ballman are you happy with the tests?
Thanks for making this change, I think having the interface more generic makes sense. Please find some comments inlined.
Nov 19 2018
Thanks all for the suggestions and comments! I've updated the patch with a better description of the attribute's behaviour (thanks @rjmccall for the starting point!) and added Sema tests.
Nov 13 2018
- Removed _aarch64_vector_pcs and __aarch64_vector_pcs keywords in favour of supporting only __attribute__(aarch64_vector_pcs)).
Nov 12 2018
Sep 6 2018
Sep 4 2018
- Calculate CSStackSize by accumulating reg-size for each saved register.
- Only sets PairedReg in SavedRegs when it is not AArch64::NoRegister.
- Removed trailing whitespace from test.
Sep 3 2018
The code in hasLoadFromStackSlot filters out MMO's that have a pseudo source value of type Stack. If that function populates an array of only those MMOs, I think its safe to do a cast<FixedStackPseudoSourceValue>(mmo->getPseudoValue()) and get rid of the FrameAccess struct.
Aug 31 2018
- Some refactoring to maintain CSStackSize.
- Migrated .ll test to a .mir test.
Fixed issue with incorrect size of MachineMemOperand.