Page MenuHomePhabricator

[AArch64][SVE] Lower DUPLANE128 to LD1RQD

Authored by MattDevereau on Jul 14 2022, 4:57 AM.



Following on from, lower DUPLANE128 to LD1RQD. This also introduces some DAGCombine logic to simplify bitcasts out of loading logic to result in less logically redundant patterns being added to instruction selection

Diff Detail

Event Timeline

MattDevereau created this revision.Jul 14 2022, 4:57 AM
Herald added a project: Restricted Project. · View Herald Transcript
MattDevereau requested review of this revision.Jul 14 2022, 4:57 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2022, 4:57 AM

Following an offline talk, ld1rqd/w/h/s needs to respect the original width of the load type due to big endian targets

Matt added a subscriber: Matt.Jul 14 2022, 10:50 AM
c-rhodes added inline comments.Jul 15 2022, 2:35 AM

I know copied this from extract above but I don't get why the operands are in reverse order?


nit: this can be moved to closer to use (NewInsert)


nit: align with above pattern

paulwalker-arm added inline comments.Jul 15 2022, 3:34 AM

NewVT = VT.changeTypeToInteger()?


Do you need to care what Bitcast.getOperand(0) is? I think we're just simplifying the DAG to remove redundant bitcasts to aid isel.

MattDevereau retitled this revision from [AArch64][SVE] Lower DUPELANE128 to LD1RQD to [AArch64][SVE] Lower DUPLANE128 to LD1RQD.
MattDevereau set the repository for this revision to rG LLVM Github Monorepo.

Sorry, I rushed to suggest a code improvements without first verifying the correctness of the patches intent.


Can this be SDTCisSameAs<0, 1>


The goal here is to replace duplane128(insert_subvector(x, bitcast(y), idx1), idx2) with bitcast(duplane128(insert_subvector(new_x,y,idx),idx) but that is only safe under specific instances of insert_subvector.

It's critical that y is a 128bit fixed length vector and idx1==idx2. Given the use of DAG.getUNDEF you also require x to be undef.

With these requirements in place I think NewVT becomes getPackedSVEVectorVT(y->getValueType()->getVectorElementType()).


I still don't know why this matters?


Loads and store are the exception to the rule when it comes to adding patterns to the multiclass. You'll see this with the scalar versions of ld1r. The reason being the B,H,S,D forms are not hidden with the multiclass like they are for say the arithmetic instructions. Having the patterns outside (i.e. within means we can handle the floating point operations as well as make optimal use of the addressing modes.


Based on the patch I think you should simplify all ld1rq# tests by removing the constants and just have the test load the data explicitly. This will also help in the future if there turns out to be a better way to compute the constants vectors these tests are doing. So for example:

define dso_local <vscale x 2 x double> @dupq_ld1rqd_f64(ptr %a) {
  %1 = load <2 x double>, ptr %a
  %2 = tail call fast <vscale x 2 x double> @llvm.vector.insert.nxv2f64.v2f64(<vscale x 2 x double> undef, <2 x double> %1, i64 0)
  %3 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.dupq.lane.nxv2f64(<vscale x 2 x double> %2, i64 0)
  ret <vscale x 2 x double> %3

I also believe the tests are better placed in sve-ld1r.ll.


I'd like to backtrack slightly. I still believe we want simpler tests added to sve-ld1r.ll for the new isel patterns. Then I guess these existing tests are required to show the need for the DAG combine. For this reason I think you want two patches, one for the isel then a second for the DAG combine

MattDevereau abandoned this revision.Jul 18 2022, 7:29 AM

This is being split into two patches, the first of which being