Trying to reduce the number of masked loads in favour of more unpklo/hi
instructions. Both ISD::ZEXTLOAD and ISD::SEXTLOAD are supported to extensions
from legal types.
Both of normal and masked loads test cases added to guard compile crash.
Differential D120953
[AArch64][SelectionDAG] Supports unpklo/hi instructions to reduce the number of loads Allen on Mar 3 2022, 6:19 PM. Authored by
Details Trying to reduce the number of masked loads in favour of more unpklo/hi Both of normal and masked loads test cases added to guard compile crash.
Diff Detail
Event Timeline
Comment Actions ping ?
Comment Actions Hi @Allen, the codegen for the tests looks good to me and the optimisation seems sensible. However, I think perhaps what @paulwalker-arm meant was that we should set all expand types for all integer combinations to be Expand, then selectively mark some as Legal. Can you confirm this is what you meant @paulwalker-arm? Comment Actions Sorry for the delay. Yes @david-arm that is what I meant. The current set of exclusions looks incomplete, for example, based on the intent of the patch I'd suggest nxv16i8 -> nxv16i32 also wants to be prevented. So it seems safest to exclude all scalable vector extending loads/truncating stores and then selectively enable those which we directly support. The first part of this is currently done for floating point scalable vector types so I think we can just replace the fp_scalable_vector_valuetypes iterators it uses with scalable_vector_valuetypes and remove the floating point related comments. With that all said I now have a bigger concern with this patch. Although the code quality for masked loads and stores is improved, the same is unlikely for the normal loads and store to which this test is also applied. I say this because for those cases no explicit unpacking is required (for the data or the predicate) but after this patch they'll started to be generated. For example ptrue p0.d ld1sw { z0.d }, p0/z, [x0] ld1sw { z1.d }, p0/z, [x0, #1, mul vl] will become ptrue p0.s ld1w { z1.s }, p0/z, [x0] sunpklo z0.d, z1.s sunpkhi z1.d, z1.s I've just tried this patch and there's an even bigger problem in that marking these operations as Expand triggers fixed length specific code in DAGCombine that results in a compiler assert/crash when passed %wide.load = load <vscale x 4 x i32>, <vscale x 4 x i32>* %base %res = sext <vscale x 4 x i32> %wide.load to <vscale x 4 x i64> This makes me think we first need to tighten up the handling of normal loads and stores to maintain the existing code quality and then apply the restrictions necessary for the masked varieties. Comment Actions Thanks very much, I'm happy to try fix the above assert/crash firstly in another commit, and do you have some advice on how to figure out all those we directly support ? Comment Actions By directly supported I mean those which are legal and thus have isel patterns. Which I think boils down to: for (auto Op : {ISD::ZEXTLOAD, ISD::SEXTLOAD}) { setLoadExtAction(Op, MVT::nxv2i64, MVT::nxv2i8, Legal); setLoadExtAction(Op, MVT::nxv2i64, MVT::nxv2i16, Legal); setLoadExtAction(Op, MVT::nxv2i64, MVT::nxv2i32, Legal); setLoadExtAction(Op, MVT::nxv4i32, MVT::nxv4i8, Legal); setLoadExtAction(Op, MVT::nxv4i32, MVT::nxv4i16, Legal); setLoadExtAction(Op, MVT::nxv8i16, MVT::nxv8i8, Legal); } I should also mention that downstream we had started work to use separate legalisation tables for each of LOAD, MLOAD and MGATHER but parked it until there was a real need. Perhaps this is that need and we should pick that work back up, but let's see how this work plays out first. Comment Actions Thanks very much, this version I firstly fix the crash as you mention, and this version only play as the beginning of such optimisation. If you agree, then I'll refact with your advice on the next version (it is safest to exclude all scalable vector extending loads/truncating stores and then selectively enable those which we directly support. ), and add more cases to guard them.
Comment Actions Add a comment TODO as there are 2 cases will fail if I delete the conditon change of isFixedLengthVector derectly LLVM :: CodeGen/AArch64/insert-subvector-res-legalization.ll LLVM :: CodeGen/AArch64/sve-fixed-length-ext-loads.ll Comment Actions Thanks @Allen this works for me. Before accepting I just wanted to double check your previous comment: Is this something you can do for this patch? Or are you wanting to save that work for a follow on patch? Comment Actions hi, @paulwalker-arm: I hope this patch can be accepted firstly. Then I'll start another patch to finish that refactor.
|
@paulwalker-arm do you think it is ok to fix the crash of normal loads ?