User Details
- User Since
- Oct 21 2016, 1:19 AM (292 w, 11 h)
Wed, May 25
Tue, May 24
Nice cleanup.
Thu, May 19
This change looks right to me.
Tue, May 17
Mon, May 16
We have a similar fix downstream and I actually thought we already upstreamed this. In any case, good fix!
Fri, May 13
Thu, May 12
That's what I did initially, but I couldn't write a test with a ZExt/SExt as ConstExpr that wouldn't fold, so that would mean I'd rewrite the code without being able to test it.
- Addressed comments.
- Reverted the fixed-width change (leading to different addrspacecast constant)
Wed, May 11
The tests all seem valid, but it's a bit difficult to review this patch without knowing what it's really trying to test without really going through AArch64TargetLowering::ReconstructShuffle in detail. That function is quite big/complicated, so it's not trivial to spot whether anything is missing or redundant.
My main question is whether all the insert/extract-subvector tests need to test all the permutations of element type and subvector?
Tue, May 10
Mon, May 9
Wed, May 4
Thanks for working on this @RosieSumpter!
Fri, Apr 29
Looks like a nice improvement and the code is now a bit more sensible.
Did you do any performance measurements to get an impression of the performance impact of this change?
Apr 27 2022
Thanks for this patch Dave, I've left a few comments.
Thanks for this @RosieSumpter. If you remove the 'WIP' and s/SVE/SVE2/ from the title and description, I'm happy to accept. The negative tests for the SVE(1) intrinsics can be added as a separate patch.
Apr 7 2022
Mar 28 2022
Mar 18 2022
Hi @t.p.northover, I've added you as a reviewer since you're the AArch64 backend maintainer and the approach set out in this patch has some implications on (forward-)compatibility of the IR.
Updated some of the comments and changed assembler behaviour to let +sme
enable all streaming-compatible SVE(2) instructions without necessarily
requiring +sve2 as well.
Hi @rsandifo-arm, thanks for your comments!
Mar 17 2022
Mar 16 2022
Thanks for the fix @malharJ, I'm happy with the patch now.
Mar 15 2022
Simplified the patch by setting +pstate-sm1 and +pstate-za1 by default,
as neither of these have any effect without +sme. This reduces the need
to have separate internal/external flags for SME like in the first
revision because having +sme imply +pstate-sm1 and +pstate-za1 led
to problems when disabling the pstate features (which is kind of their
intended use).
Mar 11 2022
Hi @malharJ, thanks for this fix! I believe this is doing the right thing, because for scalable vectors an instruction should either be considered uniform after vectorization, or it shouldn't be considered scalar.
D121297 has been committed, so I think this patch should be good to land.
Mar 9 2022
Fair enough. Since you're adding the void to the prototype for a reason (the diagnostic behaviour now changes), I figured you may as well want to test it.
This is missing tests for svundef, svrdffr, svsetffr and svpfalse?
Showing a failure on build failures is one of the things that we rely on the buildbots for, so I agree it makes sense to check for NOEXE. Thanks for the fix!
Mar 8 2022
Mar 7 2022
Mar 4 2022
Mar 3 2022
I've seen this in other places as well where introducing explicit implicit defs leads to less efficient register allocation, so I can see this is an improvement.
Apologies for the delay in reviewing, but this looks good to me.
Thanks for the update, LGTM!
Mar 2 2022
Looks accurate, thanks!
Mar 1 2022
- Replaced nonstreaming-compatible-sme with streaming-agnostic-sme
- Replaced streaming-sme with streaming-sve.
Feb 28 2022
Feb 24 2022
Added condition to findMoreOptimalIndexType that index is a vector of
i64's and swapped order of shl/mul since scale is guaranteed to be constant.
Feb 23 2022
Feb 22 2022
Addressed @david-arm's comments.
Addressed @c-rhodes' comments.
Feb 21 2022
Feb 17 2022
LGTM!
Feb 16 2022
Feb 15 2022
Thanks, LGTM!
Hi @kmclaughlin, thanks for updating the tests as MIR tests, these look really good now! I've still requested changes on the patch, because the scaling doesn't look right yet. It seems like this offsets the base pointer with the wrong value/offset. It should be pretty trivial to fix that though.