Page MenuHomePhabricator

[AArch64][SVE] Update the definition of AdvSIMD_GatherLoad_VecTorBase_Intrinsic

Authored by andwar on Dec 20 2019, 8:39 AM.



Add missing ImmArg in the definition of AdvSIMD_GatherLoad_VecTorBase_Intrinsic

Diff Detail

Event Timeline

andwar created this revision.Dec 20 2019, 8:39 AM
Herald added a project: Restricted Project. · View Herald Transcript
efriedma accepted this revision.Dec 20 2019, 1:53 PM

Would it make sense to merge int_aarch64_sve_ld1_gather and int_aarch64_sve_ld1_gather_imm into one intrinsic? They're sort of similar... maybe not worth messing with it, though.

Otherwise LGTM.

This revision is now accepted and ready to land.Dec 20 2019, 1:53 PM

Thank you for taking a look @eli.friedman. I've kept int_aarch64_sve_ld1_gather and int_aarch64_sve_ld1_gather_imm separate to better reflect the difference in the semantics of the corresponding instructions. I think that merging the two would obfuscate that separation a bit (though reduce code, which is usually a good thing). Keeping it as is would be my preference, but I might be missing the bigger picture?

better reflect the difference in the semantics of the corresponding instructions

You could make the argument that the base in int_aarch64_sve_ld1_gather is a pointer, and we should semantically require that all the loaded values should point into the same memory allocation. That would be a significant semantic difference. (The ACLE distinguishes between "scalar base, vector offset in bytes" and "vector base, scalar offset in bytes".) If that's the distinction we want to make, though, we shouldn't add ImmArg to int_aarch64_sve_ld1_gather_imm; instead, we should rename it to int_aarch64_sve_ld1_gather_scalaroffset, and add patterns for a non-immediate scalar offset.

andwar updated this revision to Diff 238058.Jan 14 2020, 11:39 AM
  • Reverted changes from the previous patch
  • Renamed int_aarch64_sve_ld1_gather_imm to int_aarch64_sve_ld1_gather_scalar_offset
  • Renamed sve-intrinsics-gather-loads-vector-base.ll as sve-intrinsics-gather-loads-vector-base-imm-offset.ll
  • Added DAG combine rule for GLD1_IMM for scenarios where the offset is a non-immediate scalar
  • Added sve-intrinsics-gather-loads-vector-base-scalar-offset.ll test file

@efriedma Thank you for the suggestions - now I see that I was missing a case here.

This makes sense.


Do you need to handle the possibility that Offset is an immediate, but can't be encoded into a GLD1_IMM?

andwar updated this revision to Diff 238223.Jan 15 2020, 5:17 AM
  • Handled casses when the immediate offset is out of range (add relevant tests too)
  • Tweaked comments to better reflect the distinction that the ACLE makes ("scalar base, vector offsets" vs "vector base, scalar offset")
  • Updated commit msg
andwar updated this revision to Diff 238322.Jan 15 2020, 10:54 AM

Similarly to gather loads, I've updated intrinisics for scatter stores ("vector base, scalar offset").

Also updated the intrinsics names in sve-gather-scatter-dag-combine.ll.

efriedma added inline comments.Jan 15 2020, 11:28 AM

"GLD1_IMM requires that the offset is an immediate in the range 0-31" is not correct, in general; the immediate is multiplied by the element size.

andwar updated this revision to Diff 238444.Jan 16 2020, 3:07 AM
andwar marked 2 inline comments as done.
  • Fix the offset limit in GLD1_IMM and SST1_IMM (now it depends on the size of the underylying datatype)
  • Update the tests accordingly
sdesmalen added inline comments.Jan 16 2020, 4:05 AM

This code should not interpret the offset as a byte offset and then scale the offset limit, but rather scale the offset with the element size before swapping Base and Offset.

Perhaps we should rename the intrinsic to llvm.aarch64.sve.ld1.gather.scalar.index to clarify the distinction between it being a scaled and unscaled offset (this also gives it the same name as in the ACLE and the architecture spec).


this needs to be a mov w8, #128 (scaled by the element size, 4) instead.

sdesmalen requested changes to this revision.Jan 16 2020, 4:05 AM
This revision now requires changes to proceed.Jan 16 2020, 4:05 AM
sdesmalen added inline comments.Jan 16 2020, 6:51 AM

Sorry, I pasted this with the wrong example. My comment related to llvm.aarch64.sve.ld1.gather.scalar.offset.nxv4i32.nxv4i32.

andwar updated this revision to Diff 238536.Jan 16 2020, 9:59 AM

Fix the scaling of the offset

When switching from GLD1_IMM to GLD1 or GLD1_UXTW, the original index needs to be scaled by the size of the underlying data to be loaded. Also, the index needs to be in the range [0, 31]. This patch addresses that.


  • Renamed llvm.aarch64.sve.ld1.gather.scalar.offset to llvm.aarch64.sve.ld1.gather.scalar.index
  • Renamed the corresponding test files
  • Updated the tests so that 32 is used as the lowest out of range value for an immediate index
  • Similar changes for scatter stores
efriedma added inline comments.Jan 16 2020, 2:32 PM

Does it actually make sense to expose both llvm.aarch64.sve.ld1.gather.scalar.index and llvm.aarch64.sve.ld1.gather.scalar? Yes, the ACLE has both "(vector base, scalar index)" and "(vector base, scalar offset in bytes)" intrinsics, but the "(vector base, scalar index)" intrinsics don't map to any single instruction if the index isn't a small constant. You have to emit a separate shift instruction.

sdesmalen added inline comments.Jan 17 2020, 2:07 AM

You're absolutely right; when I checked the ACLE document, I completely overlooked the intrinsics for scalar byte offsets. I agree it makes little sense to support two intrinsics in that case. Sorry for the confusion!

andwar updated this revision to Diff 238744.Jan 17 2020, 4:23 AM

Revert previous patch and settle on llvm.aarch64.sve.ld1.gather.scalar.offset

Based on the overall feedback, llvm.aarch64.sve.ld1.gather.scalar.offset feels sufficient to cover all cases:

  • For "vector base, scalar index" ACLE intrinisics we will have to do the scaling in clang. It is then mapped to GLD1_IMM provided that the index is within the supported range.
  • For "vector base, scalar offset" we have 1:1 mapping between the ACLE and IR intrinsic and then in DAG Combine we map it to either GLD1 or GLD1_UXTW
  • For "vector base, scalar index" ACLE intriniscs where the index is out-of-range, we fall-back to "vector base, scalar offset"

I've kept the changes for scatter stores.

andwar marked 3 inline comments as done.Jan 20 2020, 1:49 AM
andwar added inline comments.

The offset in llvm.aarch64.sve.ld1.gather.scalar.offset.nxv4i32.nxv4i32 represent bytes, so it doesn't require scaling.

sdesmalen accepted this revision.Jan 20 2020, 3:27 AM



nit: SrcEltVT is fixed-sized, so you can ask for getFixedSize() here.


nit: !OffsetConst


nit: perhaps add an assert here that Base.getValueType().getSimpleVT().SimpleTy == MVT::nxv2i64 ?


Maybe worth adding one test-case with a negative value (that should be covered by the use of getZExtValue()) ?

This revision is now accepted and ready to land.Jan 20 2020, 3:27 AM
This revision was automatically updated to reflect the committed changes.
andwar marked an inline comment as done.