This pass looks for loops such as the following:
while (i != max_len) if (a[i] != b[i]) break;
Although similar to a memcmp, this is slightly difference because instead of returning
the difference between the values of the first non-matching pair of bytes, it returns
the index of the first mismatch. As such, we are not able to lower this to a memcmp call.
Replacing this pattern with a specialised predicated SVE loop gives a significant
performance improvement for AArch64.
This patch introduces a new pass which identifies this pattern and replaces it with the
SVE loop. It is intended as a short-term solution until this is handled in the vectoriser.
A new intrinsic is created in this patch for counting the trailing zero elements in a
vector which has generic lowering in SelectionDAGBuilder. For AArch64 where SVE is
enabled, this is replaced with brkb & cntp instructions.
Patch co-authored by Kerry McLaughlin (@kmclaughlin) and David Sherwood (@david-arm)
Note: This is a work in progress, see discussion on Discourse:
https://discourse.llvm.org/t/aarch64-target-specific-loop-idiom-recognition/72383
I wonder if something like "find first nonzero element" would be better?