Tail-predication is a new form of predication in MVE for vector loops that implicitely predicates the last vector loop iteration by implicitely setting active/inactive lanes, i.e. the tail loop is predicated. In order to set up a tail-predicated vector loop, we need to know the number of data elements processed by the vector loop, which corresponds the the tripcount of the scalar loop. We would like to propagate the scalar trip count to the backend, so that this can be picked up by the MVE tail-predication pass.
This implements the approach as discussed on the llvm de list, see Eli's comment in http://lists.llvm.org/pipermail/llvm-dev/2020-May/141360.html. The approach is based on emitting an intrinsic for deriving the mask. The vectoriser emits this new intrinsic in the vector preheader block when the new TII hook indicates that the target can lower this intrinsic and that it is desired to do so for this loop. For MVE, we do this when the loop is tail-folded, which is the very first step in tail-predicating a loop. For all the other targets, this intrinsic won't be emitted as the default of the hook is of course not to do this.
This change will be followed up by MVE specific changes to lower this intrinsics.
Is IntrNoDuplicate here actually semantically significant? The LangRef explanation doesn't really indicate why it needs to be noduplicate.
Please use LLVMMatchType/LLVMScalarOrSameVectorWidth to ensure the argument/result types match.