This is an archive of the discontinued LLVM Phabricator instance.

[SVE] Add intrinsics for integer binops that explicitly undefine the result for inactive lanes.
ClosedPublic

Authored by paulwalker-arm on Jan 17 2023, 7:45 AM.

Details

Summary

The intent is to lower the clang X form SVE builtins to these
intrinsics. The suffix _x is already in use to signify unpredicated
SVE intrinsics hence my choice to use _u to signify those intrinsics
where the result for inactive lanes is undefined.

Diff Detail

Event Timeline

paulwalker-arm created this revision.Jan 17 2023, 7:45 AM
Herald added a project: Restricted Project. · View Herald Transcript
paulwalker-arm requested review of this revision.Jan 17 2023, 7:45 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 17 2023, 7:45 AM

This along with D141939 forms my counter proposal to D141240 and represents my preferred option. The only downside that I can see is that some existing instruction combines might need to be updated to also consider these new intrinsics. With that said the combines we typically do are likely not necessary for these intrinsics because we can now do similar transformation during code generation (for example converting add+mul to mla) with those transformations likely to be desirable for auto-vectorised IR anyway. D141938 shows the more immediate benefit of these intrinsics in freeing up isel to allow better code generation.

NOTE: If this proves the winning design then I'm tempted to add instruction simplications to convert calls to the existing intrinsics where the predicate is all active to these new _u forms. Doing this will allow us to remove some duplication when pattern matching and also mean function like convertMergedOpToPredOp become redundant.

The dedicated intrinsics seem better than the metadata. The cost of the extra names seems marginal compared to the complexity of ensuring we handle the metadata correctly.

That said, in some of these cases, the intrinsic seems marginally useful. What's the point of inventing llvm.aarch64.sve.add.u if we're going to immediately convert it to an LLVM IR "add"? Why not make clang just emit the "add" directly?

For the original ACLE design I made a mistake by assuming representing all the builtins via merging intrinsics would not affect code quality. So my rational here is not to make the same mistake again and thus at least at the point of IR creation have the IR capture the maximum amount of information. For a more practical reason, please consider:

add_u(pg, A, lsr_m(pg,B,C) => usra(A,B,C)	(Valid)
add(A, lsr_m(pg,B,C) => usra(A,B,C)		(Invalid)

Also, for what it's worth the lowering of add intrinsics to ISD::ADD came at a time when maximising code quality during isel was not possible. I believe D141938 changes that so we may end up reverting some of those transforms. It'll just depends on what common combines we might loose out on. Although, there is an argument that those combines should be replicated for the predicated nodes anyway.

efriedma accepted this revision.Jan 18 2023, 11:15 AM

LGTM

That explanation makes sense.

This revision is now accepted and ready to land.Jan 18 2023, 11:15 AM
Matt added a subscriber: Matt.Jan 18 2023, 2:06 PM
This revision was landed with ongoing or failed builds.Jan 22 2023, 4:26 AM
This revision was automatically updated to reflect the committed changes.