- User Since
- Jun 23 2020, 7:14 AM (143 w, 5 d)
Tue, Mar 14
Wed, Mar 8
Feb 6 2023
Feb 2 2023
Split out unrelated changes, added more tests, unified the common logic, and renamed Passthrough to Symmetric.
Feb 1 2023
I've changed the approach to not be so focused on "complex addition with zero rotation", and instead common operations applied to both sides (dubbed "symmetric operations"). The current implementation doesn't allow for targets to define their own supported operations, but also I'm not sure if there are many (if any at all) operations that wouldn't be common.
Jan 24 2023
Jan 20 2023
Jan 18 2023
Dec 12 2022
Dec 8 2022
Nov 16 2022
Nov 15 2022
Addressed comments, and added more tests
Nov 14 2022
This was pushed over to be the responsibility of the target. Therefore this approach is no longer relevant.
Nov 11 2022
Nov 8 2022
Addressed comments, and rebased to align with latest revision of D114174 (Diff 16)
Oct 21 2022
Oct 12 2022
Thanks @dmgreen for the extra tests.
Sep 28 2022
Added more interesting test cases, covering triangle and diamond multiply cases.
Sep 21 2022
Addressed comments, and redesigned the lookup list of nodes to now be represented as a graph.
Sep 12 2022
Sep 9 2022
Forgot to submit the comments adjoining the patch...
Sep 7 2022
Sep 5 2022
Addressed comments and improved case coverage,
Aug 4 2022
Addressed comments, cleaned up debug outputs, and implemented vector splitting
I've pre-committed the tests, and redesigned the core algorthm to operate on pairs of operands, rather than analysing the uses of a given instruction. The vector splitting has also been implemented on the target side
Jul 4 2022
Looks like I messed up my git-fu somewhere in preparing these patches. There are a few cleanup changes in this patch that should've been in a previous one. I'll try and clean that up when addressing other comments.
Added context missing from initial patch
Addressed comments, and added a file description that attempts to explain what the pass does.
Jun 22 2022
While that would work "simply" enough for cases like a * b, more elaborate cases (e.g. (a * b) * c) would result in some ambiguity as to which add(mul, mul) pairs with which sub(mul, mul). This complexity inflates further the more operands are involved. The approach I've implemented here considers each complex node in isolation (with the exception of the accumulator value).
I presume the == 128 can be removed if we teach it how to split the vectors up?I presume the == 128 can be removed if we teach it how to split the vectors up?
Yep, vector splitting is something I decided to push out of this initial patch, and will be implemented in a follow-up. (Due to each node being treated in isolation, the vector splitting from previous iterations got overly complicated and unwieldy). The ideal solution that I can see would be to teach the intrinsic how to split, rather than the pass (somewhere like DAGTypeLegalizer::SplitVectorResult).
Is this cost necessary at the moment, or will it always be profitable for the simple cases?
Complex add has a few cases where I've observed performance regressions, and that's the sort of thing this rudimentary cost-modelling is intended to catch.
Jun 21 2022
Since the last patch, I've redesigned the high-level approach; It now crawls through the instruction graph to find opportunities for complex deinterleaving before attempting to perform said deinterleaving. Doing it this way allows us to short-circuit, bailing out and preventing the heavy work from being performed if something isn't supported.
This iteration also implements support for complex operations with multiple steps/operands (e.g. a[i] * b[i] * c[i])
May 23 2022
Apr 8 2022
Apr 4 2022
Code looks good to me, but a small nitpick on the function name
Mar 31 2022
Mar 29 2022
Mar 28 2022
Mar 21 2022
It's been longer than I would've liked before I've gotten back to this, but I've;
Feb 17 2022
I wasn't aware of the implicit truncation. In that case, LGTM
Looks good overall, with potential wins in both performance and codesize. Just 1 minor thing (and a nitpick about a comment).
Feb 10 2022
Thanks all for the comments so far (And thanks Dave for taking on the evening shift, as it were)
Feb 9 2022
Feb 8 2022
The supportsComplexArithmetic method doesn't add a lot on its own. It would probably be better to represent it in terms of asking the backend if it has certain patterns of certain types available. An f32 fcadd, for example, or a i16 fcmul_with_rotate (I'm not sure what to name them).
The ComplexArithmeticGraph seems like internal details to the pass. It would be best not to pass it over the TTI interface boundary if we can. The backend just needs to create the correct intrinsics of the correct type.
Equally I would keep all the matching logic in the pass, not through matchComplexArithmeticIR. That would be similar to, for example, the MachineCombiner pass which has its own enum of a set of patterns.
Unfortunately, not all architectures/intrinsics follow the same patterns, so this was done to provide a way for architecture-specific stuff to be handled as needed. The MVE halving complex add (which I only now realise was included in this patch) is a good example of this; It only works on integers, and requires some extra instructions to be matched. This instruction is only in MVE, it is not part of NEON in any way.
The intrinsics that are common are also constructed differently; While the NEON intrinsics are separated out by their rotations (aarch64_neon_vcmla_rot0, aarch64_neon_vcmla_rot90), MVE squishes them into one and determines the rotation based on a parameter.
ComplexArithmeticGraph is used to encapsulate all the data the TTI might need in order to correctly generate the right intrinsic. It could be replaced by something a bit lighter (and that doesn't expose the implementation detail of the node list), but something will be needed to provide this info.
Feb 7 2022
Applied feedback received offline, redesigned the interface/implementation, and fixed up a number of fundamental issues in the previous iteration.
Jan 5 2022
Dec 15 2021
Dec 13 2021
Fixes the test to pass with this patch in isolation (i.e. without the child revision on top)
Dec 9 2021
Dec 7 2021
Dec 3 2021
Dec 1 2021
I've got a follow-up patch nearly ready with tests, yep. I want to settle on an approach here before implementing the other side of it.
Nov 25 2021
Nov 19 2021
Nov 18 2021
Sep 28 2021
Sep 21 2021
Sep 10 2021