When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale.
If my understanding of vscale is correct - which, I'm a bit unsure of, so please double check! - then vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.)
We know from the specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.)
Careful review here is appreciated. I've been looking for a way to eliminate that urem, and am a bit concerned this seems too easy. I might be missing something.
It is worth noting that AArch64 SVE explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic.