Adds a pattern to ARMInstrMVE.td to use a VQABS instruction if an equivalent multi-instruction construct is found.
No – VQABS is an integer instruction. The point is that it takes the absolute value of a signed integer, and gives you back something that fits in the same signed integer type, which means it has to map the largest negative value (say -128) to one less than its true absolute value (e.g. +127) or else it still ends up negative.
These magic numbers like 3712 and 2688 and so on could do with a comment explaining what they represent. From context it looks as if they're some kind of non-literal immediate encoding used by special node types like ARMvcmpz, ARMvmovImm and so on – but what real numbers do they represent?
This integer 3711 is different from the 3712 two lines above it. But in the other two patterns, the two corresponding numbers are identical (two copies of 2688, and two of 1664). If that's deliberate, could you add a comment saying why?
Since all three of these patterns look very similar, it ought to be possible to fold them all up into a class or multiclass. If you pass an MVEVectorVTInfo as one of the template parameters to the class, you should be able to extract both the vector type and the predicate type that goes with it (e.g. MVE_v16i8.Vec is v16i8, and MVE_v16i8.Pred is v16i1).
You might have to pass those magic integer constants as extra template parameters, though.
Ah yes, but think I got confused by the
in the pseudo-code of the instruction. And not that it really matters, but I guess that means the test just needs -mattr=+mve instead of -mattr=+mve.fp
Thanks for making me aware of that, I removed the superfluous .fp
A very good suggestion as this file needs more explanatory comments in it, I've added some now.
The difference emerges because the s16 and s32 variants used once ARMvmovImm ... to represent INT_MIN and once ARMvmvnImm ... to represent INT_MAX (by negating INT_MIN), while s8 uses ARMvmovImm ... both times and thus in the used encoding has to decrease the magic number for INT_MIN by one to represent INT_MAX.
Just updated the patch to do exactly this, it indeed looks much nicer.
Thanks, this does look much nicer!
Now I can read it more easily, I can see that what you're actually matching is effectively the vectorization of a pair of nested ternary-expressions, so that each lane is being independently transformed via the function
x > 0 ? x : (x == INT_MIN ? INT_MAX : -x)
which it's reasonably clear does implement the same function as the vqabs instruction.
I only have one small nitpick left.
This (i32 12), and the (i32 0) in the ARMvcmp a couple of lines below, are the only remaining magic numbers that have no explanation here. I think they deserve a comment explaining them: 12 and 0 are respectively the Arm architecture's condition-code encodings for GT and EQ, so this ARMvcmpz is testing if each lane is greater than zero, and the ARMvcmp below is testing if each lane is equal to (the corresponding lane of) int_min.
Thanks for reminding me, I decided to just add a comment that explains what expression the tree structure will match - hopefully that should make it clear what happens.