This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Enable (scalable) vectorization by default
ClosedPublic

Authored by reames on Jul 1 2022, 1:14 PM.

Details

Summary

This change enables vectorization (using scalable vectorization only, fixed vectors are not yet enabled). There is no test change as all of our tests are fixed to particular vectorization flags; if desired, I can add a test which exercises the default heuristics.

At this point, the resulting configuration should be both stable (e.g. no crashes), and profitable (i.e. few cases where scalar loops beat vector ones), but is not going to be particularly well tuned (i.e. we emit the best possible vector loop). The goal of this change is to align testing across organizations and ensure the default configuration matches what downstreams are using as closely as possible.

I would appreciate any help testing this before it lands. I've detailed my testing to date below, but as a practical matter, it's smaller than I'd prefer for something this major.

So far, I have successfully cross built the following: sqlite3, LLVM, Clang, Flang, LLD, LLVM's test-suite, spec2017. Not all of these *link* due to problems with my cross compilation environments, but we do compile all of the source files, and all crashes have been fixed.

Additionally, I have successfully *run* (on qemu-riscv64*) the llvm-lit portion of check-llvm, and the test-suite. Both have some failures, but everything I've looked at appears to be due to either a) human error in the run setup or b) cross build configuration problems. So, we have at least some confidence that we're not miscompiling when vectorization is enabled.

  • I had to use a downstream qemu-riscv64 implementation as the package available on ubuntu appears to not include +v at all.

Additionally, sqlite3, and clang + LLVM are invalid cost clean - meaning the cost model never returns Invalid when compiling them. Other codebases - in particular test-suite - do return Invalid costs, and I don't consider that to be blocking. After the fixes to bailout properly on invalid costs, an invalid cost should prevent vectorization, but otherwise have no impact.

I'll note that there's a bunch of work pending to improve the output of the vectorizer. At the moment, I believe this all to be tuning work, and do not consider any of it blocking for this patch.

I have not done any native builds, or been able to run any of the resulting code on real hardware. If anyone else has the potential to do so, I'd greatly appreciate the help.

Diff Detail

Event Timeline

reames created this revision.Jul 1 2022, 1:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptJul 1 2022, 1:14 PM
reames requested review of this revision.Jul 1 2022, 1:14 PM
reames added a comment.Jul 1 2022, 1:26 PM

If anyone wants to get a feel for the code differences, you can see some here: https://github.com/preames/simple-vector-riscv/tree/main/output. Look at the difference between VLEN=0_SCALABLE=off and VLEN=0_SCALABLE=on. The former is the old default, the later is after this patch. (Ignore the VLEN!=0 cases, that's for when we enable fixed length too.)

reames updated this revision to Diff 448075.Jul 27 2022, 9:33 AM

Rebase over test added to show changes in default.

Given the 15.x release branch has been branched, I'd like to go ahead and land this. Can I get an LGTM?

This revision is now accepted and ready to land.Jul 27 2022, 11:13 AM
tschuett added inline comments.
llvm/lib/Target/RISCV/RISCVTargetTransformInfo.h
60

Why is there no override?

This revision was landed with ongoing or failed builds.Jul 27 2022, 12:36 PM
This revision was automatically updated to reflect the committed changes.

It came to my attention on Friday that the Linux kernel is missing support for vector save/restore. As a result, if you try to run vector code in a Linux environment on real hardware or qemu-system, all vector instructions will generate illegal instruction faults. At least some versions of qemu-user do allow execution of vector code; I have been told this might be a bug in qemu-user as the save/restore context may also be missing there as well. (Not confirmed.)

Its worth noting that this issue applies to *all* vector code, not just the result of auto-vectorization. As such, we could chose to look at this as a user-error - specifying +v in an environment that doesn't actually support it - but that feels a tad user hostile. I'm going to chat this around with a few folks, but I'm expecting we will chose to turn off vetorization until the underlying support is widely available.

This is user error, plain and simple. If you say you have extension foo then the compiler should be free to use it. V isn't special; specifying +zba for example when your implementation lacks it has the exact same behaviour. If anything, leaving autovectorisation off would be more inconsistent.

This is user error, plain and simple. If you say you have extension foo then the compiler should be free to use it. V isn't special; specifying +zba for example when your implementation lacks it has the exact same behaviour. If anything, leaving autovectorisation off would be more inconsistent.

What if we had a -mcpu that included V? Would specifying that CPU when the kernel doesn't support it also be user error?

This is user error, plain and simple. If you say you have extension foo then the compiler should be free to use it. V isn't special; specifying +zba for example when your implementation lacks it has the exact same behaviour. If anything, leaving autovectorisation off would be more inconsistent.

What if we had a -mcpu that included V? Would specifying that CPU when the kernel doesn't support it also be user error?

I don't see how it's any different from equivalent situations on other architectures... it's a bit confusing but can't avoid it. Would you say the same thing for -mcpu enabling F on an OS that doesn't support floating-point state and then users being sad their float-using code traps rather than uses soft-float routines? Or what about SVE on Arm? V isn't special, it's just the most likely situation you'll hit the issue at the moment.

reames added a comment.Aug 9 2022, 7:51 AM

To me, F is a better analogy than Zba because it is stateful. With state, we have both signal problems, and context switch problems with registers effectively taking random values from the perspective of the running thread. This doesn't exist for a non-stateful extension.

However, it's worth saying explicitly that we don't actually get to that point for vector. Because the kernel currently doesn't enable vsstatus.VS, the current behavior of vector code is to fault with an illegal exception on any vector instruction or access to a vector state csr. So, the register state corruption is more a theoretical than practical concern.

I don't have a strong opinion here. I could see disabling auto-vec, and I could see leaving it enabled, and labeling this user error per Jessica's argument. Both have serious downsides, and I'm not clear which is better in the long run. Anyone else have an opinion?

reames added a comment.Aug 9 2022, 7:54 AM

One point worth mentioning here is that we don't seem to have an upstream CPU definition which enables any vector extension. Thus the only folks who might be impacted by @craig.topper's -mcpu question are downstream users using a vendor compiler. That might shift the calculus towards leaving this enabled as such users are likely to be using a stack which is vector enabled.

asb added a comment.Aug 9 2022, 8:06 AM

I think considering it user error if they're trying to run V binaries on a system that hasn't implemented the proper kernel-side support is probably the best option we have here.

The other option that came to mind was emitting a warning, but that warning could quickly become dated.

The fact that no existing upstream CPU definitions enable V support is a good point - this is probably something people will want to keep in mind when adding new definitions. It sounds like people will only get vectorisation if using a downstream toolchain or explicitly asking for V support, which minimises the risk of blowing up some previously working configuration.

The CPU does support the vector extensions in this case though. The problem is that the default hardware boot state does not support it. A loss of state due to the kernel not maintaining the hardware state is different than an immediate trap due to the use of an instruction. Taking x86 as an example, people check whether they have a specific processor not whether they have a specific processor, kernel version x, libc y, version z, etc. In the case of the other architectures, the kernel support for the architectural features has been there since the beginning and no one would possibly have a kernel that is missing the functionality.

I agree with @asb that a warning may become dated and annoying (not to mention people building with -Werror would be rather unhappy; I suppose a remark could work?). Perhaps we should be checking whether the kernel supports the v extension? Kernel updates are not always guaranteed and people do switch between kernel versions. This seems like a case where we may be in a state where we cannot safely determine if the cpu specification and the vended kernel can work together. A failure to boot with a trap early in the system setup is difficult to diagnose.

@compnerd Two points inspired by your response.

  1. Are you saying that the uninitialized state is not guaranteed to have VS set to off? This differs from my current understanding, and I don't see how this would follow from the wording in the vector spec. If this is true, that's nasty and we should definitely clarify that. Pointers to spec or discussion on this would be quite helpful.
  1. clang is inherently a cross compiler. We can't ask question of the host environment (e.g. kernel version). To my knowledge, this information is not available in any information we do have about the target platform. Am I missing something here?
kito-cheng added a comment.EditedAug 10 2022, 8:01 AM

clang is inherently a cross compiler. We can't ask question of the host environment (e.g. kernel version). To my knowledge, this information is not available in any information we do have about the target platform. Am I missing something here?

We could check with /proc/cpuinfo like -mcpu=native for most other targets, V only appeared in the /proc/cpuinfo only if V is supported in kernel[2].

Ideally this issue should resolved by checking Tag_RISCV_arch[1] in kernel loader.

[1] https://github.com/riscv-non-isa/riscv-elf-psabi-doc/pull/292
[2] https://patchwork.kernel.org/project/linux-riscv/patch/7fcce42051191f2c90f536d0bdbc57da1ee1d9c2.1652257230.git.greentime.hu@sifive.com/

@compnerd Two points inspired by your response.

  1. Are you saying that the uninitialized state is not guaranteed to have VS set to off? This differs from my current understanding, and I don't see how this would follow from the wording in the vector spec. If this is true, that's nasty and we should definitely clarify that. Pointers to spec or discussion on this would be quite helpful.

Poorly worded on my side - I meant that it being defaulted to off, which means it is "unsupported" by default from the application PoV.

  1. clang is inherently a cross compiler. We can't ask question of the host environment (e.g. kernel version). To my knowledge, this information is not available in any information we do have about the target platform. Am I missing something here?

That is correct. The check would need to be dynamic in that it must be part of the application. Something like what Kito suggests with processing /proc/cpuinfo. Longer term, it would be good to have the kernel actually check the riscv tag, but that leaves the older versions still in this state where things are easy to mis-configure.

clang is inherently a cross compiler. We can't ask question of the host environment (e.g. kernel version). To my knowledge, this information is not available in any information we do have about the target platform. Am I missing something here?

We could check with /proc/cpuinfo like -mcpu=native for most other targets, V only appeared in the /proc/cpuinfo only if V is supported in kernel[2].

Doing something specific for -mcpu=native would seem reasonable to me. Can you point me to a similiar warning on another target? If so, I can figure out a patch to warn on +v without the entry in /proc/cpuinfo.

Ideally this issue should resolved by checking Tag_RISCV_arch[1] in kernel loader.

Very much out of scope for this patch. :)

Also, not "kernel loader". This would be the user-land dynamic loader at most.

  1. clang is inherently a cross compiler. We can't ask question of the host environment (e.g. kernel version). To my knowledge, this information is not available in any information we do have about the target platform. Am I missing something here?

That is correct. The check would need to be dynamic in that it must be part of the application. Something like what Kito suggests with processing /proc/cpuinfo. Longer term, it would be good to have the kernel actually check the riscv tag, but that leaves the older versions still in this state where things are easy to mis-configure.

Inserting runtime checks is well out of scope for the compiler. If you're interested in this, working to mature the Tag_RISCV_arch feature would be my recommendation.

Very much out of scope for this patch. :)

Also, not "kernel loader". This would be the user-land dynamic loader at most.

It's kernel loader and we need checked before running the user-space program, and not every program will invoke dynamic linker/loader like static linked executable, but anyway that's out of scope here.