- User Since
- Oct 21 2016, 1:19 AM (142 w, 5 d)
Functionally the patch looks good, but the title suggests this adds full inline-asm support for SVE (which would require the ACLE types proposed in D62960, as well as other changes), where this patch only adds support to specify SVE registers in the clobber list.
Tue, Jul 2
Thu, Jun 27
The patch already landed, but also adding my LGTM. It seems like a case that I missed in my previous patch, thanks for fixing!
Thu, Jun 20
Thanks, I definitely want to take a look but I’m currently traveling without access to my laptop, so I won’t be able to properly review it until Wednesday.
Jun 17 2019
- Rebased patch.
- Following D60261, the scalar accumulator is either folded away or added separately as a scalar value, and so this patch only needs to focus on optimising the vector reduction itself.
Removing the recursion solves the compile-time issue and still catches all invalid cases.
Jun 14 2019
The output is expected not to change from this patch. I ran this test with current trunk and it passes, which confirms the behaviour doesn't change with this patch applied.
Not sure if there is still any benefit in committing this test separately then?
Jun 13 2019
Thanks for the review!
Jun 12 2019
- Changed: return IsDeferredCheck ? true : DeferCheck(Ty); => return IsDeferredCheck || DeferCheck(Ty);
- Use DeferredChecks.size() to determine what MatchIntrinsicTypesResult to return, instead of incorrectly relying on DeferredChecks.end() iterator.
- Removed 'dummy' intrinsics. I will post a separate patch that tests the behaviour tested with llvm.experimental.dummy.forward.struct.ret on existing intrinsics.
Jun 11 2019
Thanks for pointing out, for some reason I did not get an automated email about this failure.
Thanks for the review!
Jun 8 2019
- Fixed ExpandReductions pass to use the accumulator value and updated corresponding tests.
Jun 7 2019
Superseded by D60261.
Abandoning this revision to move ahead with D60261 instead.
May 20 2019
May 17 2019
Good spot! I've updated these tests now.
- Updated more tests.
May 16 2019
For now the answer is yes. One of our primary concerns at the moment is adding basic SVE spill/fill support and we appreciate the caveat that nothing in LLVM really supports the concept of scalable types yet, including offsets and sizes.
I think you've made some compelling reasons to try the change in layout! I'll actually try this out downstream first before updating this patch, so I can run it through our SVE testing and see if there is any impact on performance or if I run into anything unexpected.
Thanks for clarifying!
Thanks for the prod! I've sent an update to the ML.
May 10 2019
While I don't want to hold up this patch, a name change to the type will be difficult to get through once the type is in and the terminology has settled, so it may be worth getting it right from the start. Personally I think <vscale x 16 x i8> is more explicit (and therefore easier to understand for those new to the type) than <scalable 16 x i8>. And now that we all have a better understanding and clear definition of scalable types, I wonder if <n x 16 x i8> instead of <vscale x 16 x i8> is worth considering again, since it is much easier to read in tests.
Thanks for your suggestions @efriedma!
May 7 2019
- Added more unittests for StackOffset.
- Added unit tests for StackOffset.
- Moved around file comments in AArch64StackOffset.h and fixed typo.
May 3 2019
We've actually experimented with various layouts and eventually chose this layout for our HPC compiler.
Added several asserts.
May 2 2019
Apr 30 2019
@SjoerdMeijer Yes the patch is indeed quite substantial in size :) In contrast to the original SVE MC patches though, these changes are only new TableGen instruction definitions and encoding classes, corresponding (dis)assembler tests and a few flags. For example, there aren't any code changes to e.g. implement/parse new operand types.
Apr 12 2019
LGTM. The compiler will have more opportunity to use cheaper shuffles if part of the (sub)vector is undef, so this looks like a general improvement.
Apr 11 2019
From looking at the A57 scheduler model in LLVM this seems like an improvement, with for example VUZPq taking twice as long as VUZPd. But in general I could imagine shuffles on shorter vectors to be cheaper.
It will probably also be easier to spot that one of concatenated subvectors becomes fully undef after doing the shuffling on the subvector first.
Apr 10 2019
Thanks @spatel. I'll also create a follow-up patch to use a switch statement as you suggested!
- Removed redundant checks for predicate value.
- Renamed ArgV to Cmp
We found a case where repeated calls into computeKnownBits(FromAssume) skyrocketed our build-times after we added a call to llvm.assume after vectorized loops to describe the range of a newly introduced induction variable, causing some builds to time-out.
Because the 'less then' or 'less then equal' case was way down this list in this function, it had to go through all the matching calls. Unfortunately I don't have any real numbers to back this up, because we did this work a while ago.
Apr 5 2019
Apr 4 2019
- Renamed sgpr_spill -> sgpr-spill.
- Renamed TargetStackID::SGPR_Spill -> TargetStackID::SGPRSpill.
Apr 3 2019
Thanks! Yes, you're right that exposing the target-specific names/enums in target-independent code is less desirable, although note that this happens in some other places in LLVM as well; think for example of the enum in CallingConv. I made the trade-off to use the enum for several reasons:
- By using an enum way we don't get target-specific MIR parsing of common properties.
- If we want to make it target-independent, we'll need to manually parse it as a string and we lose out on out-of-the box YAML parsing, printing, verification and diagnostics of having an enum value.
- If the value needs to be a string, it will have ugly quotes around it :)
- At the moment there seems to be only very limited use of stack-id, in trunk only the AMDGPU target as far as I know. Unless we care deeply about allowing to programatically generate a range of stack-ids at compile-time (with their meaning only known at compile-time), I don't think it is much of a limitation to have a single namespace.
Apr 2 2019
Apr 1 2019
Mar 27 2019
Closing this patch manually. I accidentally changed the last digit to be D59636 when I committed this patch :)
Mar 25 2019
- This patch now uses Optional<unsigned> for getUnscaledLdSt()
- Separate out the non-functional changes from the functional change (the added cases in AArch64InstrInfo::getMemOpInfo)
- Some more simplification of the arithmetic in isAArch64FrameOffsetLegal.
Thanks @c-rhodes, LGTM.
Mar 22 2019
(not sure why Phabricator marked these comments as 'done' since they're not done yet)
Thanks @c-rhodes, this was indeed a mismatch with the SVE spec. Minor nit: the SVE spill/fill instructions also take ZPRAny operands, you could add some tests for those instructions as well to ensure it is not limited to movprfx alone.
Mar 21 2019
Mar 14 2019
@nikic thanks for pointing me to that discussion! I clearly misread the LangRef for this change :)
I agree it makes more sense to change these intrinsics and to make its accumulator argument always relevant (regardless of what flags are set) and AutoUpgrading older IR. Given that I'm currently working on this, I'd be happy to move this forward with patches and a proposal/discussion on the mailing list to change the experimental reduction intrinsics. @aemerson you expressed an intention to work on it later this year, do you have any objection to me moving forward with this now?
Updated the tests to use a scalar accumulator value (instead of a vector accumulator value, which was wrong, but seemed to be completely ignored by SelectionDAGBuilder).