Following on from https://reviews.llvm.org/D128902, lower DUPLANE128 to LD1RQD. This also introduces some DAGCombine logic to simplify bitcasts out of loading logic to result in less logically redundant patterns being added to instruction selection
Diff Detail
Unit Tests
Event Timeline
Following an offline talk, ld1rqd/w/h/s needs to respect the original width of the load type due to big endian targets
llvm/include/llvm/Target/TargetSelectionDAG.td | ||
---|---|---|
709 | I know copied this from extract above but I don't get why the operands are in reverse order? | |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
19195 | nit: this can be moved to closer to use (NewInsert) | |
llvm/lib/Target/AArch64/SVEInstrFormats.td | ||
6910 | nit: align with above pattern |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
---|---|---|
19186–19193 | NewVT = VT.changeTypeToInteger()? | |
19205–19207 | Do you need to care what Bitcast.getOperand(0) is? I think we're just simplifying the DAG to remove redundant bitcasts to aid isel. |
Sorry, I rushed to suggest a code improvements without first verifying the correctness of the patches intent.
llvm/include/llvm/Target/TargetSelectionDAG.td | ||
---|---|---|
709 | Can this be SDTCisSameAs<0, 1> | |
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp | ||
19183 | The goal here is to replace duplane128(insert_subvector(x, bitcast(y), idx1), idx2) with bitcast(duplane128(insert_subvector(new_x,y,idx),idx) but that is only safe under specific instances of insert_subvector. It's critical that y is a 128bit fixed length vector and idx1==idx2. Given the use of DAG.getUNDEF you also require x to be undef. With these requirements in place I think NewVT becomes getPackedSVEVectorVT(y->getValueType()->getVectorElementType()). | |
19191–19192 | I still don't know why this matters? | |
llvm/lib/Target/AArch64/SVEInstrFormats.td | ||
6909–6910 | Loads and store are the exception to the rule when it comes to adding patterns to the multiclass. You'll see this with the scalar versions of ld1r. The reason being the B,H,S,D forms are not hidden with the multiclass like they are for say the arithmetic instructions. Having the patterns outside (i.e. within AArch64InstrInfo.td) means we can handle the floating point operations as well as make optimal use of the addressing modes. | |
llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll | ||
592–594 | Based on the patch I think you should simplify all ld1rq# tests by removing the constants and just have the test load the data explicitly. This will also help in the future if there turns out to be a better way to compute the constants vectors these tests are doing. So for example: define dso_local <vscale x 2 x double> @dupq_ld1rqd_f64(ptr %a) { %1 = load <2 x double>, ptr %a %2 = tail call fast <vscale x 2 x double> @llvm.vector.insert.nxv2f64.v2f64(<vscale x 2 x double> undef, <2 x double> %1, i64 0) %3 = tail call fast <vscale x 2 x double> @llvm.aarch64.sve.dupq.lane.nxv2f64(<vscale x 2 x double> %2, i64 0) ret <vscale x 2 x double> %3 } I also believe the tests are better placed in sve-ld1r.ll. |
llvm/test/CodeGen/AArch64/sve-intrinsics-perm-select.ll | ||
---|---|---|
592–594 | I'd like to backtrack slightly. I still believe we want simpler tests added to sve-ld1r.ll for the new isel patterns. Then I guess these existing tests are required to show the need for the DAG combine. For this reason I think you want two patches, one for the isel then a second for the DAG combine |
This is being split into two patches, the first of which being https://reviews.llvm.org/D130010
I know copied this from extract above but I don't get why the operands are in reverse order?