The intent is to lower the clang X form SVE builtins to these
intrinsics. The suffix _x is already in use to signify unpredicated
SVE intrinsics hence my choice to use _u to signify those intrinsics
where the result for inactive lanes is undefined.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
This along with D141939 forms my counter proposal to D141240 and represents my preferred option. The only downside that I can see is that some existing instruction combines might need to be updated to also consider these new intrinsics. With that said the combines we typically do are likely not necessary for these intrinsics because we can now do similar transformation during code generation (for example converting add+mul to mla) with those transformations likely to be desirable for auto-vectorised IR anyway. D141938 shows the more immediate benefit of these intrinsics in freeing up isel to allow better code generation.
The dedicated intrinsics seem better than the metadata. The cost of the extra names seems marginal compared to the complexity of ensuring we handle the metadata correctly.
That said, in some of these cases, the intrinsic seems marginally useful. What's the point of inventing llvm.aarch64.sve.add.u if we're going to immediately convert it to an LLVM IR "add"? Why not make clang just emit the "add" directly?
For the original ACLE design I made a mistake by assuming representing all the builtins via merging intrinsics would not affect code quality. So my rational here is not to make the same mistake again and thus at least at the point of IR creation have the IR capture the maximum amount of information. For a more practical reason, please consider:
add_u(pg, A, lsr_m(pg,B,C) => usra(A,B,C) (Valid) add(A, lsr_m(pg,B,C) => usra(A,B,C) (Invalid)
Also, for what it's worth the lowering of add intrinsics to ISD::ADD came at a time when maximising code quality during isel was not possible. I believe D141938 changes that so we may end up reverting some of those transforms. It'll just depends on what common combines we might loose out on. Although, there is an argument that those combines should be replicated for the predicated nodes anyway.