- User Since
- Jan 6 2015, 6:21 AM (326 w, 4 d)
Jan 29 2021
[NOT READY FOR REVIEW]
Jan 25 2021
Ok, I see where you are coming from now. LoopVectorize is keeping the shuffle result full by widening the the load+shuffle to double wide. LV's double wide choice seems like a weird one, but I suppose if that sequence is codegen'd correctly, then it will work out.
Jan 22 2021
Jan 19 2021
Having said that, I wonder if we should revisit the idea of allowing shuffle vectors to accept step vector masks?
Jan 15 2021
In D94444, @paulwalker-arm proposed a more generic extract vector intrinsic that accepts an index and stride. Now I'm wondering if we should just have a generic scalable shuffle vector intrinsic to handle all these operations under one intrinsic.
Jan 14 2021
Jan 13 2021
Add known minimum number of elements restrictions...
Jan 12 2021
Updated to @david-arm's suggested naming scheme...
I'm assuming scheduling the new addvls closer to their uses is a register pressure win?
Address some of @sdesmalen's comments, but deferring name changes...
Jan 11 2021
Jan 7 2021
Jan 6 2021
Jan 4 2021
Dec 26 2020
Dec 17 2020
Add FIXME comment.
Dec 15 2020
Dec 14 2020
Dec 11 2020
Dec 10 2020
LGTM with one nit below...
Dec 4 2020
I think @ctetreau's "first class citizen" argument on the RFC has merit though. But this patch is a good first step if we're not ready to extend ShuffleVector yet. I personally would like to see ShuffleVector extended eventually, since it would be easier to optimize.
Dec 1 2020
Do we need to protect against mismatched element types? Or does legalization handle those exts/truncs?
Nov 12 2020
Nov 10 2020
Nov 4 2020
Nov 3 2020
Reformat to appease pre-merge checks...
Nov 2 2020
Oct 30 2020
Update patch based on @nikic's comments...
Oct 28 2020
Updated patch with, I think, all the needed legalizations.
Oct 27 2020
Comment from ARM/ARMISelLowering.cpp:
Ah, I see it in ARM/. That will work...
Update 'neutral' element to -0.0.
Oct 23 2020
 I just wanted to highlight my previous VBITS_EQ_256-COUNT-33: fadd comment as this gives us a bit more test coverage and is something that will obviously fail (in a good way) when the splitting work is available.
@paulwalker-arm, back to the splitting discussion...
Updating patch, but not ready for a serious review yet as I haven't started the splitting work. I'm still not convinced we can handle splitting appropriately with the current setup, but will comment on that seperately.
Oct 22 2020
Try again with 80 column fix...
Fix 80 column issue. No other changes intended...
Oct 19 2020
Some other notes:
It looks like NEON FADDA support is missing upstream too.
This is ready for review now...
Oct 14 2020
Not lowering to SVE for v2f## MVTs makes sense for now but as before when we have proper support for v#i1 our hands will be tied.
Oct 12 2020
Oh, and I can do VECREDUCE_FADD first. Just hit VECREDUCE_SEQ_FADD before I realized I need to add 'fast' to the intrinsic calls.
You're right on all the comments. I stopped midway when I hit the legalisation issue, so that's why this patch is rough. That would need to be built out first before this work could continue. And I think NEON support should be built out before that. It sounds like I'm not stepping on toes, so I'll go in that order. Thanks.
Oct 9 2020
Oct 7 2020
Legalisation tests added for sve-fixed-length-fp-reduce.ll in:
Correct OR->EOR variable name.
Oct 5 2020
Legalisation tests added for sve-fixed-length-int-reduce.ll in:
Committed. It looks like the legalisations seem reasonable. Something like:
Oct 2 2020
Oct 1 2020
Sep 29 2020
Ok, I think that's all of them. Looks like it started with D87796 and was buried in other changes. To confirm: