Implements the following intrinsics:
- llvm.aarch64.sve.convert.to.svbool
- llvm.aarch64.sve.convert.from.svbool
For converting the ACLE svbool_t type (<n x 16 x i1>) to and from the
other predicate types: <n x 8 x i1>, <n x 4 x i1> and <n x 2 x i1>.
Differential D74471
[AArch64][SVE] Add predicate reinterpret intrinsics c-rhodes on Feb 12 2020, 1:49 AM. Authored by
Details Implements the following intrinsics:
For converting the ACLE svbool_t type (<n x 16 x i1>) to and from the
Diff Detail
Event TimelineComment Actions Simple patch that implements the conversions to/from svbool_t, looks good to me!
Comment Actions When you convert from <vscale x 2 x i1> to <vscale x 16 x i1> using sve.convert.to.svbool, what values do the other bits contain? Zero? Poison? Something else? Comment Actions When converting from <vscale x 2 x i1> to svbool_t, the other lanes are defined as zero. As you probably spotted, this implementation does not actually guarantee that those lanes are zeroed. At the moment though, this is not really a problem because there are no optimizations on these intrinsics that rely on this definition and the instructions themselves produce predicates where the lanes are zeroed implicitly. If we don't fold the zeroing operation into the instruction that implicitly zeroes the lanes, there is a performance penalty, so we should write some peephole pass to remove the unnecessary instructions. We can either choose to:
Without having properly discussed this and deciding on which way to go, my LGTM was probably a bit premature. Sorry for that! Comment Actions I'd prefer correctness first, optimizations later. If we miss the optimization, it's a giant red flag to anyone reading the assembly. If we miss correctness, some very unlucky person might waste days trying to track down a miscompile.
I'm not sure what you mean by this; you can trivially break the defined semantics by just writing convert.to.svbool(convert.from.svbool(ptrue)). Do you mean the way the C intrinsics are lowered gives some protection? Comment Actions Correct, the intrinsics are intended to be created by the Clang ACLE implementation, which will reinterpret-cast the predicate results to svbool_t. Without optimisations that introduce a pattern like you describe, nothing breaks in practice. Of course when we will start adding such optimisations, that changes things. I'm happy with the decision to focus on correctness first though, it probably shouldn't be too hard to fold away the explicit zeroing. @c-rhodes can you update this patch to explicitly zero the other lanes?
Comment Actions Looks fine other than the change to LowerSPLAT_VECTOR
|
If the conversion isn't changing the type, do you need a REINTERPRET_CAST at all?