Use the whole gamut of constant immediate values available to set up a vector. Instead of using, for example, mov w0, #0xffff; dup v0.4s, w0, which transfers between register files, use the more efficient movi v0.4s, #-1 instead. Not limited to just a few values, but any immediate value that can be encoded by all the variants of FMOV, MOVI, MVNI.
Details
Diff Detail
Event Timeline
The case you implemented looks sensible, but I wonder if this can be implemented more generically so that the pattern can be used for other constants as well. Maybe it can be done using a ComplexPattern that encodes any (by MOVI encodable) constant as an immediate+shift and then uses those as input to MOVI. Is this something you have considered?
You should probably just fix AArch64TargetLowering::LowerBUILD_VECTOR to generate the right node in the first place. Not sure why it isn't picking up these particular cases, but it has code to handle constant BUILD_VECTORS.
Indeed, the DUPs are coming from AArch64TargetLowering::LowerBUILD_VECTOR().
Which approach would be more elegant and which would be the ups and downs of each one? On one hand, lowering seems to be more efficient in terms of compile time. On the other hand, pattern matching may be more elegant.
You want to avoid repeating all the code in AArch64TargetLowering::LowerBUILD_VECTOR which already exists for matching constants... so you probably need to fix it in LowerBUILD_VECTOR.
The patch looks good to me.
Thanks!
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
6615 | Please commit this NFC cleanup as a separate patch. |
Please commit this NFC cleanup as a separate patch.