This commit extends SVEIntrinsicOpts::optimizeConvertFromSVBool to
identify and remove longer chains of redundant SVE reintepret
intrinsics. For example, the following chain of redundant SVE
reinterprets is now recognised as redundant:
%a = <vscale x 2 x i1> %1 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 2 x i1> %a) %2 = <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %1) %3 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 4 x i1> %2) %4 = <vscale x 4 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %3) %5 = <vscale x 16 x i1> @llvm.aarch64.sve.convert.to.svbool(<vscale x 4 x i1> %4) %6 = <vscale x 2 x i1> @llvm.aarch64.sve.convert.from.svbool(<vscale x 16 x i1> %5) ret <vscale x 2 x i1> %6
and will be replaced with:
ret <vscale x 2 x i1> %a
Eliminating these can sometimes mean emitting fewer unnecessary
loads/stores when lowering to assembly.
If you create a test with similar to this, but with "<vscale x 2 x i1> %a" is there a bug? From your algorithm above it looks like EarliestRemoval would be "%2 tail call ...", but we'd keep iterating Cursor until we get to "%a". If I've understood your algorithm correctly won't that mean we end up deleting %1 and %2 and end up with this?
define <vscale x 4 x i1> @reinterpret_test_partial_chain(<vscale x 8 x i1> %a) {
}