Page MenuHomePhabricator

[LangRef] Update text for vscale to be more flexible but maintain original intent.
Needs RevisionPublic

Authored by paulwalker-arm on Sep 26 2022, 8:27 AM.

Details

Summary

llvm/docs/AArch64SME.rst contains a set of design decisions that
include controlled ways for an AArch64 binary to support multiple
values of vscale during program execution albeit not at the same time.

However, the LangRef prohibits this by requiring vscale to be constant
throughout the whole execution of a binary. This is overly restrictive
as limiting to function call boundaries should be sufficient.

Such a change does not affect existing code because changes of vscale
must be well defined within the IR (e.g. function attributes) of which
nothing exists prior to LLVM16 and so original behaviour is maintained.
For example, traditional function inlining can continue because while
the new definition allows the caller and callee to have different
vscales, neither function contain any decoration that allows such a
transition and thus it's safe to assume there is none. Essentially,
all complexity is pushed onto the implementors of vscale changing
behaviour

Diff Detail

Unit TestsFailed

TimeTest
60 msx64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::lljit-initialize-deinitialize.ll
Script: -- : 'RUN: at line 1'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/lli -jit-kind=orc -jit-linker=jitlink -orc-runtime=/var/lib/buildkite-agent/builds/llvm-project/build/./lib/clang/16.0.0/lib/x86_64-unknown-linux-gnu/liborc_rt.a /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/lljit-initialize-deinitialize.ll | FileCheck /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/lljit-initialize-deinitialize.ll
190 msx64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::priority-static-initializer.S
Script: -- : 'RUN: at line 4'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -m64 -c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/orc/X86_64LinuxConfig/TestCases/Linux/x86-64/Output/priority-static-initializer.S.tmp /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/priority-static-initializer.S
240 msx64 debian > ORC-x86_64-linux.TestCases/Linux/x86-64::trivial-cxa-atexit.S
Script: -- : 'RUN: at line 3'; /var/lib/buildkite-agent/builds/llvm-project/build/./bin/clang -m64 -c -o /var/lib/buildkite-agent/builds/llvm-project/build/projects/compiler-rt/test/orc/X86_64LinuxConfig/TestCases/Linux/x86-64/Output/trivial-cxa-atexit.S.tmp /var/lib/buildkite-agent/builds/llvm-project/compiler-rt/test/orc/TestCases/Linux/x86-64/trivial-cxa-atexit.S

Event Timeline

paulwalker-arm created this revision.Sep 26 2022, 8:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 26 2022, 8:27 AM
paulwalker-arm requested review of this revision.Sep 26 2022, 8:27 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 26 2022, 8:27 AM

I've create this patch in response to D131562's review comments. I've deliberately tried to keep things simple as I believe the complexity of what is and isn't allowed during a transition belongs with the mechanism that instigates the transition. I've also simplified the description of llvm.vscale rather than duplicate the finer details.

nikic requested changes to this revision.Sep 26 2022, 8:36 AM
nikic added a subscriber: nikic.

I don't think we can allow this as long as vscale constant expressions are supported (these are currently represented via gep of scalable type). I am rather strongly opposed to having constants that are not in fact constant, so this change should only happen after a migration towards llvm.vscale intrinsics.

This revision now requires changes to proceed.Sep 26 2022, 8:36 AM

I don't think we can allow this as long as vscale constant expressions are supported (these are currently represented via gep of scalable type). I am rather strongly opposed to having constants that are not in fact constant, so this change should only happen after a migration towards llvm.vscale intrinsics.

I haven't been following the scalable types story in a lot of detail, but doesn't disallowing GEP of scalable types imply that pointer calculations need to be done via decomposed integer arithmetic? Does this break for non-integral pointer types?

I don't think we can allow this as long as vscale constant expressions are supported (these are currently represented via gep of scalable type). I am rather strongly opposed to having constants that are not in fact constant, so this change should only happen after a migration towards llvm.vscale intrinsics.

I haven't been following the scalable types story in a lot of detail, but doesn't disallowing GEP of scalable types imply that pointer calculations need to be done via decomposed integer arithmetic? Does this break for non-integral pointer types?

I believe the ask here relates only to GEP based ConstantExpr expressions and is presumably part of the wider effort to limit how ConstantExprs are used. I don't object to the proposal as I've never liked this representation of vscale as to my mind it doesn't consider how DataLayout could mean the result isn't quite what you'd expect.

Killing off GEP constant expressions that involve indexing over a scalable type sounds like a fine idea. (I think we want to keep GEP instructions over scalable types... I mean, I guess we could get rid of them, but that seems more complicated.)

tschuett added a subscriber: tschuett.EditedSep 27 2022, 2:06 PM

I am still in favour of llvm.aarch64.sme.vscale().

nikic added a comment.Sep 27 2022, 2:47 PM

I don't think we can allow this as long as vscale constant expressions are supported (these are currently represented via gep of scalable type). I am rather strongly opposed to having constants that are not in fact constant, so this change should only happen after a migration towards llvm.vscale intrinsics.

I haven't been following the scalable types story in a lot of detail, but doesn't disallowing GEP of scalable types imply that pointer calculations need to be done via decomposed integer arithmetic? Does this break for non-integral pointer types?

Arbitrary arithmetic on pointers is represented via getelementptr i8, ptr %p, i64 %offset, it's not necessary (and highly undesirable) to drop down to ptrtoint + add + inttoptr.

For vscale in particular, it's not necessary to go that far:

%gep = getelementptr <vscale x N x %T>, ptr %p, i64 %index
; Is equivalent to
%vscale = call i64 @llvm.vscale.i64()
%scaled.index = mul i64 %index, %vscale
%gep = getelementptr <N x %T>, ptr %p, i64 %scaled.index

Though for the purposes of this change, I'm mainly concerned about the constant expression case, which would no longer be well-defined if vscale is non-constant. I think constant scalable GEPs are mostly used to represent vscale itself (using something like ptrtoint (ptr getelementptr (<vscale x 1 x i8>, ptr null, i64 1) to i64)).

Removing scalable GEPs entirely would also be beneficial, because our current support for scalable GEPs tends to be of the "just don't crash" kind and adding proper support for them is associated with a lot of additional complexity. On the other hand, a llvm.vscale-based representation "just works", e.g. BasicAA would be able to reason about it without further changes. But that's kind of orthogonal to this particular change.

I am still in favour of llvm.aarch64.sme.vscale().

I'm not sure this solves the problem, as after a transition to streaming mode, the value returned by vanilla vscale() will also be different.

I don't think we can allow this as long as vscale constant expressions are supported (these are currently represented via gep of scalable type). I am rather strongly opposed to having constants that are not in fact constant, so this change should only happen after a migration towards llvm.vscale intrinsics.

I haven't been following the scalable types story in a lot of detail, but doesn't disallowing GEP of scalable types imply that pointer calculations need to be done via decomposed integer arithmetic? Does this break for non-integral pointer types?

Arbitrary arithmetic on pointers is represented via getelementptr i8, ptr %p, i64 %offset, it's not necessary (and highly undesirable) to drop down to ptrtoint + add + inttoptr.

For vscale in particular, it's not necessary to go that far:

%gep = getelementptr <vscale x N x %T>, ptr %p, i64 %index
; Is equivalent to
%vscale = call i64 @llvm.vscale.i64()
%scaled.index = mul i64 %index, %vscale
%gep = getelementptr <N x %T>, ptr %p, i64 %scaled.index

Though for the purposes of this change, I'm mainly concerned about the constant expression case, which would no longer be well-defined if vscale is non-constant. I think constant scalable GEPs are mostly used to represent vscale itself (using something like ptrtoint (ptr getelementptr (<vscale x 1 x i8>, ptr null, i64 1) to i64)).

Removing scalable GEPs entirely would also be beneficial, because our current support for scalable GEPs tends to be of the "just don't crash" kind and adding proper support for them is associated with a lot of additional complexity. On the other hand, a llvm.vscale-based representation "just works", e.g. BasicAA would be able to reason about it without further changes. But that's kind of orthogonal to this particular change.

Sure, my point was that removing scalable GEPs entirely would mean that some hypothetical architecture with non-integral pointers and scalable types would be relying on GEP for address materialization, as plain integer arithmetic wouldn't work. So to me it seems we can only remove the constant expressions.

tschuett added a comment.EditedSep 27 2022, 4:06 PM

I am still in favour of llvm.aarch64.sme.vscale().

I'm not sure this solves the problem, as after a transition to streaming mode, the value returned by vanilla vscale() will also be different.

llvm.vscale() would still return the sve-scale.

Matt added a subscriber: Matt.Sep 27 2022, 8:18 PM

I am still in favour of llvm.aarch64.sme.vscale().

I'm not sure this solves the problem, as after a transition to streaming mode, the value returned by vanilla vscale() will also be different.

llvm.vscale() would still return the sve-scale.

Sorry if I'm misunderstanding you. After a transition to streaming mode, all SVE vectors become SVL length. If you were to use llvm.vscale() to compute a vector's size in this mode, it *must* return the same value as llvm.aarch64.sme.vscale(). In the case of streaming compatible functions which may be executing in any mode, I don't think you can know which intrinsic to call at compile time in order to get the correct vector length.

Although SME is my rational I'd rather keep things in the context of LLVM IR rather than any specific implementation. It is not the intent of this change to represent multiple values of vscale concurrently and thus we do not need a way to query anything but the current value of vscale. Please also remember that we're talking about more than just an intrinsic. llvm.vscale() returns the value of vscale from <vscale x 2 x i64>. If we were to add another intrinsic, say llvm.another_vscale(), then that implies we'd need a new vector type of the form <another_vscale x 2 x i64>, which is not something we need for SME enablement.

llvm.vscale() denotes vl and llvm.aarch64.sme.vscale() denotes svl. From the context you should know which one you need.

You want to change the semantics of vscale in a Phabricator Diff without an RFC on Discourse?

llvm.vscale() denotes vl and llvm.aarch64.sme.vscale() denotes svl. From the context you should know which one you need.

There are times, such as in streaming-compatible functions, where it's unknown at compile time which mode is active.

You want to change the semantics of vscale in a Phabricator Diff without an RFC on Discourse?

I think this patch was made in good faith in response to requests for a langref update in another patch.

You want to change the semantics of vscale in a Phabricator Diff without an RFC on Discourse?

We can start a Discourse thread for wider exposure, sure.