Add missing ImmArg in the definition of AdvSIMD_GatherLoad_VecTorBase_Intrinsic
Would it make sense to merge int_aarch64_sve_ld1_gather and int_aarch64_sve_ld1_gather_imm into one intrinsic? They're sort of similar... maybe not worth messing with it, though.
Thank you for taking a look @eli.friedman. I've kept int_aarch64_sve_ld1_gather and int_aarch64_sve_ld1_gather_imm separate to better reflect the difference in the semantics of the corresponding instructions. I think that merging the two would obfuscate that separation a bit (though reduce code, which is usually a good thing). Keeping it as is would be my preference, but I might be missing the bigger picture?
better reflect the difference in the semantics of the corresponding instructions
You could make the argument that the base in int_aarch64_sve_ld1_gather is a pointer, and we should semantically require that all the loaded values should point into the same memory allocation. That would be a significant semantic difference. (The ACLE distinguishes between "scalar base, vector offset in bytes" and "vector base, scalar offset in bytes".) If that's the distinction we want to make, though, we shouldn't add ImmArg to int_aarch64_sve_ld1_gather_imm; instead, we should rename it to int_aarch64_sve_ld1_gather_scalaroffset, and add patterns for a non-immediate scalar offset.
- Reverted changes from the previous patch
- Renamed int_aarch64_sve_ld1_gather_imm to int_aarch64_sve_ld1_gather_scalar_offset
- Renamed sve-intrinsics-gather-loads-vector-base.ll as sve-intrinsics-gather-loads-vector-base-imm-offset.ll
- Added DAG combine rule for GLD1_IMM for scenarios where the offset is a non-immediate scalar
- Added sve-intrinsics-gather-loads-vector-base-scalar-offset.ll test file
@efriedma Thank you for the suggestions - now I see that I was missing a case here.
- Handled casses when the immediate offset is out of range (add relevant tests too)
- Tweaked comments to better reflect the distinction that the ACLE makes ("scalar base, vector offsets" vs "vector base, scalar offset")
- Updated commit msg
This code should not interpret the offset as a byte offset and then scale the offset limit, but rather scale the offset with the element size before swapping Base and Offset.
Perhaps we should rename the intrinsic to llvm.aarch64.sve.ld1.gather.scalar.index to clarify the distinction between it being a scaled and unscaled offset (this also gives it the same name as in the ACLE and the architecture spec).
this needs to be a mov w8, #128 (scaled by the element size, 4) instead.
Sorry, I pasted this with the wrong example. My comment related to llvm.aarch64.sve.ld1.gather.scalar.offset.nxv4i32.nxv4i32.
Fix the scaling of the offset
When switching from GLD1_IMM to GLD1 or GLD1_UXTW, the original index needs to be scaled by the size of the underlying data to be loaded. Also, the index needs to be in the range [0, 31]. This patch addresses that.
- Renamed llvm.aarch64.sve.ld1.gather.scalar.offset to llvm.aarch64.sve.ld1.gather.scalar.index
- Renamed the corresponding test files
- Updated the tests so that 32 is used as the lowest out of range value for an immediate index
- Similar changes for scatter stores
Does it actually make sense to expose both llvm.aarch64.sve.ld1.gather.scalar.index and llvm.aarch64.sve.ld1.gather.scalar? Yes, the ACLE has both "(vector base, scalar index)" and "(vector base, scalar offset in bytes)" intrinsics, but the "(vector base, scalar index)" intrinsics don't map to any single instruction if the index isn't a small constant. You have to emit a separate shift instruction.
You're absolutely right; when I checked the ACLE document, I completely overlooked the intrinsics for scalar byte offsets. I agree it makes little sense to support two intrinsics in that case. Sorry for the confusion!
Revert previous patch and settle on llvm.aarch64.sve.ld1.gather.scalar.offset
Based on the overall feedback, llvm.aarch64.sve.ld1.gather.scalar.offset feels sufficient to cover all cases:
- For "vector base, scalar index" ACLE intrinisics we will have to do the scaling in clang. It is then mapped to GLD1_IMM provided that the index is within the supported range.
- For "vector base, scalar offset" we have 1:1 mapping between the ACLE and IR intrinsic and then in DAG Combine we map it to either GLD1 or GLD1_UXTW
- For "vector base, scalar index" ACLE intriniscs where the index is out-of-range, we fall-back to "vector base, scalar offset"
I've kept the changes for scatter stores.
The offset in llvm.aarch64.sve.ld1.gather.scalar.offset.nxv4i32.nxv4i32 represent bytes, so it doesn't require scaling.
nit: SrcEltVT is fixed-sized, so you can ask for getFixedSize() here.
nit: perhaps add an assert here that Base.getValueType().getSimpleVT().SimpleTy == MVT::nxv2i64 ?
Maybe worth adding one test-case with a negative value (that should be covered by the use of getZExtValue()) ?