This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Exploit fact that vscale is always power of two to replace urem sequence
ClosedPublic

Authored by reames on Jul 12 2022, 5:49 PM.

Details

Summary

When doing scalable vectorization, the loop vectorizer uses a urem in the computation of the vector trip count. The RHS of that urem is a (possibly shifted) call to @llvm.vscale.

If my understanding of vscale is correct - which, I'm a bit unsure of, so please double check! - then vscale is effectively the number of "blocks" in the vector register. (That is, types such as <vscale x 8 x i8> and <vscale x 1 x i8> both fill one 64 bit block, and vscale is essentially how many of those blocks there are in a single vector register at runtime.)

We know from the specification that VLEN must be a power of two between ELEN and 2^16. Since our block size is 64 bits, the must be a power of two numbers of blocks. (For everything other than VLEN<=32, but that's already broken.)

Careful review here is appreciated. I've been looking for a way to eliminate that urem, and am a bit concerned this seems too easy. I might be missing something.

It is worth noting that AArch64 SVE explicitly allows non-power-of-two sizes for the vector registers and thus can't claim that vscale is a power of two by this logic.

Diff Detail

Event Timeline

reames created this revision.Jul 12 2022, 5:49 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 12 2022, 5:49 PM
reames requested review of this revision.Jul 12 2022, 5:49 PM

Nice!

But I'd say that given that we define vscale=VLEN/64 and we know VLEN is a power of two >= 64 (ignoring 32 which we know is broken), isn't that sufficient justification?

reames updated this revision to Diff 444291.Jul 13 2022, 8:59 AM

Simplify justification comment per reviewer suggestion.

This revision is now accepted and ready to land.Jul 13 2022, 9:09 AM
frasercrmck accepted this revision.Jul 13 2022, 9:16 AM

LGTM too (thanks for correcting me in that it's really RVVBitsPerBlock even if that's always 64)

This revision was landed with ongoing or failed builds.Jul 13 2022, 10:55 AM
This revision was automatically updated to reflect the committed changes.