This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Use knowledge of VLEN to avoid over-aligning the stack
Needs ReviewPublic

Authored by frasercrmck on May 26 2022, 7:08 AM.

Details

Summary

With D125787, the size of the RVV portion of the stack is a
16-byte-aligned to help maintain the overall alignment of the stack as a
whole. Alignments of the RVV objects themselves are also correctly
aligned.

The naive implementation in that patch ensured these alignments while
ignoring the effect that the implicit multiplication by VLENB has on
alignment. Since VLEN is a power of two we often take on extra
alignment when calculating the final object offsets.

This patch now takes VLEN into account when calculating RVV object
offsets by reducing the required alignment by the known 8-byte multiple
we receive from VLEN.

Diff Detail

Event Timeline

frasercrmck created this revision.May 26 2022, 7:08 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 26 2022, 7:08 AM
frasercrmck requested review of this revision.May 26 2022, 7:09 AM

I'm a bit confused by this patch. Where does the VLEN > 32 bit come from? All I kind find in the spec is that VLEN >= ELEN, and must be a power of 2. Given ELEN only has to be 8, shouldn't the smallest VLEN also be 8? This seems like an utterly useless configuration, but the spec seems to allow it?

I'm a bit confused by this patch. Where does the VLEN > 32 bit come from? All I kind find in the spec is that VLEN >= ELEN, and must be a power of 2. Given ELEN only has to be 8, shouldn't the smallest VLEN also be 8? This seems like an utterly useless configuration, but the spec seems to allow it?

VLEN is determined by the Zvl32b, Zvl64b, Zvl128b, etc. extensions. V implies Zvl128b. Zve64* implies Zvl64b. Zve32* implies Zvl32b. VLEN can never be less than 32 with the currently defined extensions. And our support VLEN == 32 is broken in many ways. We should probably discuss that at some point.

craig.topper added inline comments.May 26 2022, 10:33 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
928

VLEN can be 32. But our calculation of vscale and scalable types is broken for it.

I'm a bit confused by this patch. Where does the VLEN > 32 bit come from? All I kind find in the spec is that VLEN >= ELEN, and must be a power of 2. Given ELEN only has to be 8, shouldn't the smallest VLEN also be 8? This seems like an utterly useless configuration, but the spec seems to allow it?

VLEN is determined by the Zvl32b, Zvl64b, Zvl128b, etc. extensions. V implies Zvl128b. Zve64* implies Zvl64b. Zve32* implies Zvl32b. VLEN can never be less than 32 with the currently defined extensions.

Thanks for the pointer. The wording here is as follows:
"Note: Explicit use of the Zvl32b extension string is not required for any standard vector extension as they all effectively mandate at least this minimum, but the string can be useful when stating hardware capabilities."

Reviewing 18.2 and 18.3 confirms that none of the proposed vector variants allow VLEN < 32.

I'm a bit confused by this patch. Where does the VLEN > 32 bit come from? All I kind find in the spec is that VLEN >= ELEN, and must be a power of 2. Given ELEN only has to be 8, shouldn't the smallest VLEN also be 8? This seems like an utterly useless configuration, but the spec seems to allow it?

VLEN is determined by the Zvl32b, Zvl64b, Zvl128b, etc. extensions. V implies Zvl128b. Zve64* implies Zvl64b. Zve32* implies Zvl32b. VLEN can never be less than 32 with the currently defined extensions.

Thanks for the pointer. The wording here is as follows:
"Note: Explicit use of the Zvl32b extension string is not required for any standard vector extension as they all effectively mandate at least this minimum, but the string can be useful when stating hardware capabilities."

Reviewing 18.2 and 18.3 confirms that none of the proposed vector variants allow VLEN < 32.

It's a fair point. I don't really suppose it's worth mentioning that it's at least 32 as this should work for any power of two at least 8. Do you think specifically mentioning 32 may make people assume it relies on that property?

A note on this patch I forgot to add yesterday - it does affect the nominal offsets we compute for RVV frame objects (see wrong-stack-offset-for-rvv-object.mir - the object is now "at" offset -8 rather than -16). Another option could be to keep the offset calculation as it is now but change how we materialize the instructions to reach those offsets. I thought this way was perhaps slightly cleaner and less error-prone: it's one place, rather than forcing all disparate users of this info to compute offsets correctly. Just it crossed my mind that someone reading offset: -8 may be forgiven for thinking that we're not correctly aligning objects. Let me know what you think. I think both ways have their drawbacks - neither are super intuitive.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
928

Ah yes, thanks - slip of the fingers. I wanted to say greater than or equal to 32. It's probably not worth a caveat here that we don't properly support 32 here, is it? It's largely irrelevant since the calculation works for any power of two.

rebase/ping

reames requested changes to this revision.Jun 30 2022, 9:47 AM
reames added inline comments.
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
924

Everything above this line appears to be an NFC change + an early return. Please split that, land, and rebase.

This is minor, but since I've now looked at this code several time when glancing at this patch, I'd rather not spend time glancing at it again.

936

This really doesn't look correct to me.

Given an object aligned to 32 bytes, knowing that it's a multiple of 16 bytes does not allow you to align it only to 2 bytes. Which, unless I'm missing something, is what this code does.

I think what you need here is to compute the running alignment, use the MinVLen information to update that alignment, then if the required alignment is greater than that, adjust the offset.

It's entirely possible I'm misreading this though. Having a ObjectSize which is a integer for an object which is inherently variable sized doesn't make sense to me. I'm not entirely sure what ObjectSize actually represents here. Maybe there's just some missing comments in the original code?

This revision now requires changes to proceed.Jun 30 2022, 9:47 AM

rebase on split patch

frasercrmck marked 2 inline comments as done.

fix typo

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
924

Landed separately, thanks for the suggestion.

928

(finally) fixed this typo.

936

Right, that's basically what it does. I think it's correct.

The ObjectSize is the "known min size" portion of the scalable type size in bytes. The scalable portion is implicitly left for us (the target) to handle. We're artificially lowering the scalable parts of the offsets knowing that they'll all later be multiplied by VLENB. Since we're in control of the vscale multiplying, as long as all scalable objects are consistently treated in this manner, it should work out correct.

restore manual mir test checks for wrong-stack-offset-for-rvv-object

Thanks for the review, @kito-cheng! I'll use this opportunity to ping this patch once again as I'm not super comfortable landing it if there's still objections from people, e.g. @reames