This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads
ClosedPublic

Authored by Allen on Mar 3 2022, 6:19 PM.

Details

Summary

Trying to reduce the number of masked loads in favour of more unpklo/hi
instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions
from legal types.

Both of normal and masked loads test cases added to guard compile crash.

Diff Detail

Event Timeline

Allen created this revision.Mar 3 2022, 6:19 PM
Allen requested review of this revision.Mar 3 2022, 6:19 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 3 2022, 6:19 PM
david-arm added inline comments.Mar 4 2022, 3:04 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1246

So this is really just an optimisation where you're trying to reduce the number of loads we perform, in favour of more unpklo/hi instructions. It seems to make sense and looks like an improvement, but this change is very specific to one set of types and one extension type. What about ISD::ZEXTLOAD and other extensions from legal types, i.e.

%wide.masked.load = call <vscale x 8 x i16> @llvm.masked.load.nxv8i16.p0nxv8i16(<vscale x 8 x i16>* %base, i32 2, <vscale x 8 x i1> %mask, <vscale x 8 x i16> undef)
%res = sext <vscale x 8 x i16> %wide.masked.load to <vscale x 8 x i64>

I think if we go down this route it's worth adding all other extension types from legal inputs too.

paulwalker-arm added inline comments.Mar 4 2022, 3:12 AM
llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1246

It seems we're currently marking all the scalable floating point types as Expand, so we should do the same for all the scalable integer types and then selectively enable the ones that make sense. Although I'd actually prefer the common default to be Expand rather than forcing all targets to do the initialisation, but I guess that's outside the scope of this patch.

Allen updated this revision to Diff 413200.Mar 4 2022, 11:37 PM

update to support more type and zero extend

Allen retitled this revision from [AArch64][SelectionDAG] Prevent legality of extloads nxv4i64 from nxv4i32 to [AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads.Mar 4 2022, 11:38 PM
Allen edited the summary of this revision. (Show Details)
Allen updated this revision to Diff 413346.Mar 6 2022, 11:22 PM
Allen marked an inline comment as done.
Allen updated this revision to Diff 413486.Mar 7 2022, 7:57 AM

add test case in file sve-intrinsics-mask-loads.ll

Allen marked an inline comment as done.Mar 8 2022, 7:05 AM

ping ?

llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
1246

Thanks very much, and also update comment.
1、ISD::ZEXTLOAD added
2、types vscale x 16 x i8, vscale x 8 x i16 and vscale x 4 x i32 are considered, please let me know if more types need be added.

1246

Thanks very much.

As not all of the scalable integer types will be benifit from the disable of extending loads, so I only choose part of them, does it right ?

for example, now ld1h { z0.s }, p0/z, [x0] is already fine, and there is extra and z0.s, z0.s, #0xffff after disable the extending loads.

%wide.masked.load = call <vscale x 4 x i16> @llvm.masked.load.nxv4i16(<vscale x 4 x i16>* %a, i32 4, <vscale x 4 x i1> %mask, <vscale x 4 x i16> undef)
  %res = zext <vscale x 4 x i16> %wide.masked.load to <vscale x 4 x i32>

Hi @Allen, the codegen for the tests looks good to me and the optimisation seems sensible. However, I think perhaps what @paulwalker-arm meant was that we should set all expand types for all integer combinations to be Expand, then selectively mark some as Legal. Can you confirm this is what you meant @paulwalker-arm?

Allen marked an inline comment as done.Mar 10 2022, 4:00 AM

Hi @Allen, the codegen for the tests looks good to me and the optimisation seems sensible. However, I think perhaps what @paulwalker-arm meant was that we should set all expand types for all integer combinations to be Expand, then selectively mark some as Legal. Can you confirm this is what you meant @paulwalker-arm?

ok, Thanks very much. I wait @paulwalker-arm to check that.

Sorry for the delay. Yes @david-arm that is what I meant. The current set of exclusions looks incomplete, for example, based on the intent of the patch I'd suggest nxv16i8 -> nxv16i32 also wants to be prevented. So it seems safest to exclude all scalable vector extending loads/truncating stores and then selectively enable those which we directly support. The first part of this is currently done for floating point scalable vector types so I think we can just replace the fp_scalable_vector_valuetypes iterators it uses with scalable_vector_valuetypes and remove the floating point related comments.

With that all said I now have a bigger concern with this patch. Although the code quality for masked loads and stores is improved, the same is unlikely for the normal loads and store to which this test is also applied. I say this because for those cases no explicit unpacking is required (for the data or the predicate) but after this patch they'll started to be generated.

For example

ptrue	p0.d
ld1sw	{ z0.d }, p0/z, [x0]
ld1sw	{ z1.d }, p0/z, [x0, #1, mul vl]

will become

ptrue	p0.s
ld1w	{ z1.s }, p0/z, [x0]
sunpklo	z0.d, z1.s
sunpkhi	z1.d, z1.s

I've just tried this patch and there's an even bigger problem in that marking these operations as Expand triggers fixed length specific code in DAGCombine that results in a compiler assert/crash when passed

%wide.load = load <vscale x 4 x i32>, <vscale x 4 x i32>* %base
%res = sext <vscale x 4 x i32> %wide.load to <vscale x 4 x i64>

This makes me think we first need to tighten up the handling of normal loads and stores to maintain the existing code quality and then apply the restrictions necessary for the masked varieties.

Allen added a comment.Mar 10 2022, 5:18 PM

@paulwalker-arm

Thanks very much, I'm happy to try fix the above assert/crash firstly in another commit, and do you have some advice on how to figure out all those we directly support ?

By directly supported I mean those which are legal and thus have isel patterns. Which I think boils down to:

for (auto Op : {ISD::ZEXTLOAD, ISD::SEXTLOAD}) {
  setLoadExtAction(Op, MVT::nxv2i64, MVT::nxv2i8, Legal);
  setLoadExtAction(Op, MVT::nxv2i64, MVT::nxv2i16, Legal);
  setLoadExtAction(Op, MVT::nxv2i64, MVT::nxv2i32, Legal);
  setLoadExtAction(Op, MVT::nxv4i32, MVT::nxv4i8, Legal);
  setLoadExtAction(Op, MVT::nxv4i32, MVT::nxv4i16, Legal);
  setLoadExtAction(Op, MVT::nxv8i16, MVT::nxv8i8, Legal);
}

I should also mention that downstream we had started work to use separate legalisation tables for each of LOAD, MLOAD and MGATHER but parked it until there was a real need. Perhaps this is that need and we should pick that work back up, but let's see how this work plays out first.

Allen updated this revision to Diff 414841.Mar 12 2022, 7:52 AM
Allen edited the summary of this revision. (Show Details)

Fix the compile crash of normal loads

Allen added a comment.EditedMar 12 2022, 8:07 AM

By directly supported I mean those which are legal and thus have isel patterns. Which I think boils down to:

Thanks very much, this version I firstly fix the crash as you mention, and this version only play as the beginning of such optimisation.

If you agree, then I'll refact with your advice on the next version (it is safest to exclude all scalable vector extending loads/truncating stores and then selectively enable those which we directly support. ), and add more cases to guard them.

Matt added a subscriber: Matt.Mar 17 2022, 5:55 PM
paulwalker-arm added inline comments.Mar 18 2022, 5:39 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11497

Sorry for the delay but I've been ponding this on and off. I cannot shake the feeling that any "is vector" check here is artificial and so limiting it to only fixed length vectors also seems artificial.

Personally I think isVector() should be removed and any negative effects on code generation being the result of that target's implementation of isVectorLoadExtDesirable() likely being at fault.

That said, I understand this might be above and beyond the work you want to carry out so I guess you making the current restriction slightly less artificial is a nice step in the right direction.

11506

However, the isVectorLoadExtDesirable() question will always be pertinent regardless of vector type and so you shouldn't need this change.

Allen updated this revision to Diff 416675.EditedMar 19 2022, 2:19 AM

Add a comment TODO as there are 2 cases will fail if I delete the conditon change of isFixedLengthVector derectly

LLVM :: CodeGen/AArch64/insert-subvector-res-legalization.ll
LLVM :: CodeGen/AArch64/sve-fixed-length-ext-loads.ll

Thanks @Allen this works for me. Before accepting I just wanted to double check your previous comment:

Thanks very much, this version I firstly fix the crash as you mention, and this version only play as the beginning of such optimisation.

If you agree, then I'll refact with your advice on the next version (it is safest to exclude all scalable vector extending loads/truncating stores and then selectively enable those which we directly support. ), and add more cases to guard them.

Is this something you can do for this patch? Or are you wanting to save that work for a follow on patch?

Allen marked 2 inline comments as done.EditedMar 21 2022, 8:05 AM

Thanks @Allen this works for me. Before accepting I just wanted to double check your previous comment:

Thanks very much, this version I firstly fix the crash as you mention, and this version only play as the beginning of such optimisation.

If you agree, then I'll refact with your advice on the next version (it is safest to exclude all scalable vector extending loads/truncating stores and then selectively enable those which we directly support. ), and add more cases to guard them.

Is this something you can do for this patch? Or are you wanting to save that work for a follow on patch?

hi, @paulwalker-arm:

I hope this patch can be accepted firstly. Then I'll start another patch to finish that refactor.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11497

@paulwalker-arm do you think it is ok to fix the crash of normal loads ?

11497

Thanks very much.
If you agree, I can add a comment "TODO: isFixedLengthVector() should be removed and any negative effects on code generation being the result of that target's implementation of isVectorLoadExtDesirable()", and try to refactor that with a separate commit.

11506

Done, thanks

paulwalker-arm accepted this revision.Mar 21 2022, 8:38 AM
paulwalker-arm added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
11497

@Allen I'm happy with this change. It doesn't actually fix the bug that caused the crash but does maintain existing behaviour so that buggy code path is not hit (at least with the current set of tests).

The bug itself is within DAGCombiner::CombineExtLoad whereby is has a !DstVT.isScalableVector() assert that appears after a call to getVectorNumElements() so the code will fail before ever hitting that assert. The real fix is to remove the assert and just add

if (DstVT.isScalableVector())
  return SDValue();

to the top of DAGCombiner::CombineExtLoad. As you've worked round the issue it's not strictly necessary but if you could add that fix to this patch then it'll stop anybody else from hitting it.

This revision is now accepted and ready to land.Mar 21 2022, 8:38 AM
This revision was landed with ongoing or failed builds.Mar 21 2022, 8:48 AM
This revision was automatically updated to reflect the committed changes.
Allen marked 2 inline comments as done.