This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add a subtarget feature to enable unaligned scalar loads and stores
ClosedPublic

Authored by reames on May 20 2022, 12:32 PM.

Details

Summary

A RISCV implementation can choose to implement unaligned load/store support. We currently don't have a way for such a processor to indicate a preference for unaligned load/stores, so add a subtarget feature.

There doesn't appear to be a formal extension for unaligned support. The RISCV Profiles (https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc#rva20u64-profile) docs use the name Zicclsm, but a) that doesn't appear to actually been standardized, and b) isn't quite what we want here anyway due to the perf comment.

Instead, we can follow precedent from other backends and have a feature flag for the existence of misaligned load/stores with sufficient performance that user code should actually use them.

Diff Detail

Event Timeline

reames created this revision.May 20 2022, 12:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2022, 12:32 PM
reames requested review of this revision.May 20 2022, 12:32 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2022, 12:32 PM
craig.topper added inline comments.May 20 2022, 12:57 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
11820

This seems like it should be a different feature for vectors.

reames added inline comments.May 20 2022, 1:09 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
11820

There might be such a target where scalar vs vector matters, but on the motivating case, this is not expected to matter. We can split later if needed.

Is this meant to be "it works" or "it works without trapping for emulation"? Pretty much every EEI out there has misaligned accesses guaranteed to work, just not quickly, and in those cases you'd still want to avoid them as the inlined byte-wise code is far faster.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
11810

If returning true and Fast isn't null we should be setting it; some users pre-initialise it, but not all (I found one in GlobalISel and stopped looking)

craig.topper added inline comments.May 20 2022, 1:14 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
11820

There are no vector tests and the description in the .td file says “scalar”. The patch should at least be self consistent.

reames added inline comments.May 20 2022, 1:22 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
11810

Good catch. I will unconditionally initialize to false to make sure we don't miss any paths. Thanks!

11820

You're absolutely correct on that. I'd originally had it split, then changed my mind. Let me rebase and correct both of those!

craig.topper added inline comments.May 20 2022, 1:39 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
11816

Not directly related to this patch, but it could be argued that *Fast = true is incorrect for vectors here. Being element aligned doesn't guarantee optimal access if you're splitting a cache line or other alignment requirements of your memory unit.

Is this meant to be "it works" or "it works without trapping for emulation"? Pretty much every EEI out there has misaligned accesses guaranteed to work, just not quickly, and in those cases you'd still want to avoid them as the inlined byte-wise code is far faster.

From what I can tell looking at existing targets, the three states for unaligned are basically.

  • "not present" - either the hardware doesn't support, or performance is unreasonable
  • "present" - hardware is present, and reasonable performant. unaligned accesses in source are lowered as such.
  • "fast" - hardware is present, optimizer should use when doesn't exist in code already

I tried to follow that convention with this change.

reames updated this revision to Diff 431056.May 20 2022, 2:27 PM

address review comments

reames added inline comments.May 20 2022, 2:32 PM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
11820

It turns out writing vector tests was a useful exercise. :)

From what I can tell, we aggressively canonicalize towards byte element loads and stores for vectors. As such, no plain vector load or store is every treated as unaligned.

However, some indexed and strided load tests did fail if I asserted the fallthrough return was never taken.

For the moment, I decided to restrict the scope to only the scalar case. Mostly because there look to be enough other opportunities around indexed loads that this didn't seem terribly useful.

With that, I could go in two directions here.

  1. Rename this to scalar and add a separate flag later for vector cases as warranted.
  2. Document the intent that this covers all access, but leave the indexed/strided case currently unimplemented.

I have no strong preference and will do as reviewers request.

asb added inline comments.May 25 2022, 7:31 AM
llvm/lib/Target/RISCV/RISCV.td
435

Nit: reasonable => reasonably

reames marked an inline comment as not done.May 25 2022, 11:01 AM
reames added inline comments.
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
11820
craig.topper added inline comments.May 25 2022, 11:44 AM
llvm/lib/Target/RISCV/RISCVISelLowering.cpp
11820

Let's rename it to scalar and add a separate flag for vector later as warranted.

reames updated this revision to Diff 432069.May 25 2022, 11:59 AM
reames marked an inline comment as not done.
reames retitled this revision from [RISCV] Add a subtarget feature to enable unaligned loads to [RISCV] Add a subtarget feature to enable unaligned scalar loads and stores.

Revise per reviewer request

craig.topper added inline comments.May 25 2022, 12:07 PM
llvm/lib/Target/RISCV/RISCV.td
435

asb's comment still needs to be addressed.

reames added inline comments.May 25 2022, 12:34 PM
llvm/lib/Target/RISCV/RISCV.td
435

Oops, yep, will rev.

reames updated this revision to Diff 432085.May 25 2022, 12:35 PM

address missed review comment

asb added a comment.May 25 2022, 12:49 PM

I don't think it affects this patch (as splitting scalar vs vector features for alignment makes sense regardless of the frontend option), but I had a quick look at the frontend option for this. Looks like on the GCC side it's -mno-strict-align. On Arm/AArch64 it's spelled -munaligned-access and __ARM_FEATURE_UNALIGNED is set. Unless there's some target generic #define that's set that I'm missing, it feels like it would be useful to set a define for the RISC-V case as well (and agree this with the GNU folks).

One thing I noticed that does affect this patch (and sorry I didn't spot this earlier!), is we should be setting Tag_RISCV_unaligned_access when this feature is set (see here).

I don't think it affects this patch (as splitting scalar vs vector features for alignment makes sense regardless of the frontend option), but I had a quick look at the frontend option for this. Looks like on the GCC side it's -mno-strict-align. On Arm/AArch64 it's spelled -munaligned-access and __ARM_FEATURE_UNALIGNED is set. Unless there's some target generic #define that's set that I'm missing, it feels like it would be useful to set a define for the RISC-V case as well (and agree this with the GNU folks).

I agree this is separate. What a target defaults to should not be blocked on what a frontend might chose to override. I'll note that these flags don't appear to influence x86 handling of the corresponding cases, so unless this is handled somewhere generically I missed, this isn't RISCV specific in any way.

One thing I noticed that does affect this patch (and sorry I didn't spot this earlier!), is we should be setting Tag_RISCV_unaligned_access when this feature is set (see here).

Er, I'm sorry, what? Why?

I agree this spec exists, but what purpose does the attribute bit actually serve? There's no interaction with the dynamic loader here. It's a compiler codegen choice.

If you really want, I'm fine adding the attribute wiring (in a separate patch), but this really looks like a spec that is a) out of sync with reality, and b) serves no obvious purpose.

asb added a comment.May 25 2022, 1:45 PM

I don't think it affects this patch (as splitting scalar vs vector features for alignment makes sense regardless of the frontend option), but I had a quick look at the frontend option for this. Looks like on the GCC side it's -mno-strict-align. On Arm/AArch64 it's spelled -munaligned-access and __ARM_FEATURE_UNALIGNED is set. Unless there's some target generic #define that's set that I'm missing, it feels like it would be useful to set a define for the RISC-V case as well (and agree this with the GNU folks).

I agree this is separate. What a target defaults to should not be blocked on what a frontend might chose to override. I'll note that these flags don't appear to influence x86 handling of the corresponding cases, so unless this is handled somewhere generically I missed, this isn't RISCV specific in any way.

Yep, just mentioning for future reference.

One thing I noticed that does affect this patch (and sorry I didn't spot this earlier!), is we should be setting Tag_RISCV_unaligned_access when this feature is set (see here).

Er, I'm sorry, what? Why?

I agree this spec exists, but what purpose does the attribute bit actually serve? There's no interaction with the dynamic loader here. It's a compiler codegen choice.

If you really want, I'm fine adding the attribute wiring (in a separate patch), but this really looks like a spec that is a) out of sync with reality, and b) serves no obvious purpose.

I'm not sure the attribute is used in the wild in any meaningful way right now, but I think it does have purpose. There's no guarantee that a RISC-V EEI actually provides a trap handler for misaligned accesses, so on some targets (largely embedded), compiling with +unaligned-scalar-mem may result in code that is unsuitable. Given there's no way to describe support for misaligned accesses in the ISA naming string, it's probably similarly useful to the other attributes from the perspective of any tooling working with ELF files. Such tooling might be used to warn / error when a likely build misconfiguration is found within a project.

If I were authoring the patch, I'd likely put the attribute wiring in the same patch (or perhaps if I really wanted to split things, have patch 1 adding the feature but only impacting the attribute, and patch 2 adding codegen tests and changes). But tastes vary and I don't think it makes a big difference for this patch, so I'm happy with whatever sequencing you prefer here.

Technically it's may not will so you can just hard-wire it to 1...

asb added a comment.EditedMay 26 2022, 6:05 AM

This is all that's needed to hook up the appropriate attribute:

diff --git a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVTargetStreamer.cpp b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVTargetStreamer.cpp
index 5f9ed77d07cf..ac0c8113135a 100644
--- a/llvm/lib/Target/RISCV/MCTargetDesc/RISCVTargetStreamer.cpp
+++ b/llvm/lib/Target/RISCV/MCTargetDesc/RISCVTargetStreamer.cpp
@@ -50,6 +50,9 @@ void RISCVTargetStreamer::emitTargetAttributes(const MCSubtargetInfo &STI) {
   else
     emitAttribute(RISCVAttrs::STACK_ALIGN, RISCVAttrs::ALIGN_16);
 
+  if (STI.hasFeature(RISCV::FeatureUnalignedScalarMem))
+    emitAttribute(RISCVAttrs::UNALIGNED_ACCESS, RISCVAttrs::ALLOWED);
+
   auto ParseResult = RISCVFeatures::parseFeatureBits(
       STI.hasFeature(RISCV::Feature64Bit), STI.getFeatureBits());
   if (!ParseResult) {
diff --git a/llvm/test/CodeGen/RISCV/attributes.ll b/llvm/test/CodeGen/RISCV/attributes.ll
index 9d9f02ce52cb..0734b5a01b45 100644
--- a/llvm/test/CodeGen/RISCV/attributes.ll
+++ b/llvm/test/CodeGen/RISCV/attributes.ll
@@ -147,6 +147,14 @@
 ; RV64COMBINEINTOZKN: .attribute 5, "rv64i2p0_zbkb1p0_zbkc1p0_zbkx1p0_zkn1p0_zknd1p0_zkne1p0_zknh1p0"
 ; RV64COMBINEINTOZKS: .attribute 5, "rv64i2p0_zbkb1p0_zbkc1p0_zbkx1p0_zks1p0_zksed1p0_zksh1p0"
 
+; RUN: llc -mtriple=riscv32 %s -o - | FileCheck --check-prefix=ALIGNED %s
+; RUN: llc -mtriple=riscv64 %s -o - | FileCheck --check-prefix=ALIGNED %s
+; RUN: llc -mtriple=riscv32 -mattr=+unaligned-scalar-mem %s -o - | FileCheck --check-prefix=UNALIGNED %s
+; RUN: llc -mtriple=riscv64 -mattr=+unaligned-scalar-mem %s -o - | FileCheck --check-prefix=UNALIGNED %s
+
+; ALIGNED-NOT: .attribute 6
+; UNALIGNED: .attribute 6, 1
+
 define i32 @addi(i32 %a) {
   %1 = add i32 %a, 1
   ret i32 %1

I think not emitting Tag_RISCV_unaligned_access in the absence of +unaligned-scalar-mem (as the above patch does) is probably the best we can do, as inline assembly or handwritten .s files may well include misaligned accesses, even without +unaligned-scalar-mem.

For the patch under review, ALIGN and NOALIGN are (to my eyes) reversed in meaning. i.e. I'd expect NOALIGN to cover RUN invocations with -mattr=+unaligned-scalar-mem. But maybe there's an alternate interpretation?

The topic of Tag_RISCV_unaligned_access handling was discussed in today's risc-v sync up call. The conclusion of that discussion is that we're going to leave it unset in this patch, and that I'm going to file an issue against the psABI to clarify a couple points in the spec. Once that clarification has been done, we may revisit setting the tag.

In the short run, we believe that leaving the tag in the unset state is the lowest risk strategy, and the least likely to break assumptions which may have been made about the meaning of the tag.

asb added a comment.EditedMay 26 2022, 8:57 AM

As just discussed in the sync-up call meeting, let's defer the question of setting / not setting the attribute to an upstream discussion in riscv-elf-psabi-doc on the precise semantics of the tag. Philip kindly volunteered to kick off an issue in that repo.

I don't think setting the tag or not needs to block this landing (though the degree to which ELF tags are opt-in is perhaps something that could be further clarified in the psABI doc too).

(EDIT: Ah, Philip beat me to it!)

reames updated this revision to Diff 432325.May 26 2022, 10:07 AM

Address test check comment.

Anything remaining blocking this? I know I need to file the psABI issue and will do so before pushing. Asking about further review items.

asb accepted this revision.May 26 2022, 10:17 AM

Address test check comment.

Anything remaining blocking this? I know I need to file the psABI issue and will do so before pushing. Asking about further review items.

Nothing remaining from me - LGTM.

This revision is now accepted and ready to land.May 26 2022, 10:17 AM
This revision was landed with ongoing or failed builds.May 26 2022, 3:56 PM
This revision was automatically updated to reflect the committed changes.

How to enable this feature through clang? I tried this:

$ ./bin/clang -S t.c -O2 -mllvm -mattr=+unaligned-scalar-mem

But get:
clang (LLVM option parsing): Unknown command line argument '-mattr=+unaligned-scalar-mem'. Try: 'clang (LLVM option parsing) --help'
clang (LLVM option parsing): Did you mean '--mv67t=+unaligned-scalar-mem'?

How to enable this feature through clang? I tried this:

$ ./bin/clang -S t.c -O2 -mllvm -mattr=+unaligned-scalar-mem

But get:
clang (LLVM option parsing): Unknown command line argument '-mattr=+unaligned-scalar-mem'. Try: 'clang (LLVM option parsing) --help'
clang (LLVM option parsing): Did you mean '--mv67t=+unaligned-scalar-mem'?

Try -Xclang -target-feature -Xclang +unaligned-scalar-mem

How to enable this feature through clang? I tried this:

$ ./bin/clang -S t.c -O2 -mllvm -mattr=+unaligned-scalar-mem

But get:
clang (LLVM option parsing): Unknown command line argument '-mattr=+unaligned-scalar-mem'. Try: 'clang (LLVM option parsing) --help'
clang (LLVM option parsing): Did you mean '--mv67t=+unaligned-scalar-mem'?

Try -Xclang -target-feature -Xclang +unaligned-scalar-mem

Thanks for the qucik response, it works for me.