This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Permit larger RVV stacks and stack offsets
ClosedPublic

Authored by frasercrmck on Jun 22 2021, 10:58 AM.

Details

Summary

This patch teaches the compiler to generate code to handle larger RVV
stack sizes and stack offsets which resolve an amount larger than 2047
vector registers in size.

The previous behaviour was asserting on such large values as it was only
able to materialize the constant by feeding it to the 12-bit immediate
of an ADDI instruction. The compiler can now materialize this amount
into a temporary register before continuing with the computation.

A test case for this scenario is included which also checks that the
temporary register used to materialize the amount doesn't require an
additional spill slot over what we're already reserving for RVV code.

Diff Detail

Event Timeline

frasercrmck created this revision.Jun 22 2021, 10:58 AM
frasercrmck requested review of this revision.Jun 22 2021, 10:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 22 2021, 10:58 AM

Strictly speaking it's not 12-bit stack offsets we're now supporting but rather a 12-bit "number of vector registers". For whatever reason I was unable to find a concise way of explaining this in the commit message/description so if anyone's got a better way I'm all ears.

jrtc27 added a comment.EditedJun 22 2021, 11:08 AM

"Support more than 2047 vector register stack slots"?

Although this isn't even that, this is "Support spilling more than 2047 vector elements"?

Although this isn't even that, this is "Support spilling more than 2047 vector elements"?

Yeah, sort of? It is a fairly generic function used both to set up the stack by decrementing the stack pointer by the RVV stack size and also to fix up frame indices, so it's not strictly equal to spilling this or that number of registers. Stack slots is perhaps a little better, actually, but I could still think of some pathological case where I have relatively few vectors on the stack but with really high alignments that effectively push up the stack size and thus the offsets. Maybe I should just say that we can now support larger "VLEN-factored amounts"? That's sort of meaningless to the casual reader though. Oh what a can of worms.

rogfer01 accepted this revision.Jun 24 2021, 2:28 AM

Strictly speaking it's not 12-bit stack offsets we're now supporting but rather a 12-bit "number of vector registers". For whatever reason I was unable to find a concise way of explaining this in the commit message/description so if anyone's got a better way I'm all ears.

I think "number of vector registers" may be good enough. This seems difficult to describe because the offsets in the scalable vector stackid have their offsets conceptually scaled to the smallest vscale possible (=1), so the code counts how many registers (currently 8 bytes per VR under vscale=1) are contained in (the unscaled) Amount to later scale that using vlenb.

I'm curious when you hit this. Asking because, unless I'm reading the testcase wrong, you're computing a NumOfVReg that seems huge (3072?)

Other than that LGTM. Thanks @frasercrmck

This revision is now accepted and ready to land.Jun 24 2021, 2:28 AM

Strictly speaking it's not 12-bit stack offsets we're now supporting but rather a 12-bit "number of vector registers". For whatever reason I was unable to find a concise way of explaining this in the commit message/description so if anyone's got a better way I'm all ears.

I think "number of vector registers" may be good enough. This seems difficult to describe because the offsets in the scalable vector stackid have their offsets conceptually scaled to the smallest vscale possible (=1), so the code counts how many registers (currently 8 bytes per VR under vscale=1) are contained in (the unscaled) Amount to later scale that using vlenb.

I'm curious when you hit this. Asking because, unless I'm reading the testcase wrong, you're computing a NumOfVReg that seems huge (3072?)

Other than that LGTM. Thanks @frasercrmck

Yeah I suppose the code says it best: it's the number of VLEN-sized registers. I have no idea why I find this part of the code so unintuitive. Probably for the reason you describe.

As for how we reached this situation, I was really trying to "torture test" the code generator by really pushing the vectorization factor in our OpenCL implementation. Basically it ended up force-vectorizing kernels containing things like <16 x i64> up by vscale x 16 and so was ending up with nonsensically wide vectors like <vscale x 256 x i64> which must of course spill. I'm not claiming that this is the sole reason as I've seen some particularly bad code generation even at the lower VFs (anything larger than vscale x 1 can end up spilling even for small kernels). But yeah if there's ever a reason to spill a silly amount of registers, this'd be it. The good news is that this was the only failure I saw across the test suite!

The test case itself is just a pseudo-random number I chose that's higher than 2048 but isn't near a power of two (meaning it'd choose to shift) I haven't been through all the "real" failures: the only one I looked at was somewhere in the 2000s.

frasercrmck retitled this revision from [RISCV] Permit RVV stack offsets larger than 12 bits to [RISCV] Permit larger RVV stacks and stack offsets.Jun 24 2021, 8:53 AM
frasercrmck edited the summary of this revision. (Show Details)
frasercrmck edited the summary of this revision. (Show Details)

remove explicit use of TII->

This revision was automatically updated to reflect the committed changes.