Memory functions (memcmp, memcpy, ...) are typically recognized by the
compiler and expanded to specific asm patterns when the size is known at
compile time.
This will help catch any regressions in expansions.
Right now we're only testing memcmp (see context in D60318).
Magical constant
I'm guessing that by 4096 you limit the maximal size of p and q buffers,
implying that they should fit into L1 cache?
Do you want to use the actual L1 size instead?
Otherwise,