This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Mark known zero for high 16-bits of uaddlv intrinsic output with v8i8
ClosedPublic

Authored by jaykang10 on Aug 23 2023, 6:35 AM.

Details

Summary

If llvm.aarch64.neon.uaddlv intrinsic has v8i8 type input, the it returns 16-bits value.
clang generates llvm.aarch64.neon.uaddlv.i32.v8i8 and trunc to i16 for vaddlv_u8 neon intrinsic. It causes additional and 0xffff instruction from attached example as below.

foo:                                    // @foo
        uaddlv  h0, v0.8b
        fmov    w8, s0
        and     w0, w8, #0xffff
        ret

If we mark know zero for high 16-bits of uaddlv intrinsic output with v8i8, we can avoid the additional and 0xfff.

foo:                                    // @foo
        uaddlv  h0, v0.8b
        fmov    w0, s0
        ret

Diff Detail

Unit TestsFailed

Event Timeline

jaykang10 created this revision.Aug 23 2023, 6:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 6:35 AM
jaykang10 requested review of this revision.Aug 23 2023, 6:35 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 6:35 AM
dmgreen accepted this revision.Aug 24 2023, 2:00 AM

LGTM, thanks.

This revision is now accepted and ready to land.Aug 24 2023, 2:00 AM

Can we also handle v16i8?

Is it worth trying to use a more precise bound, instead of just the approximate power of two? The largest number of bits that can actually be set for a v8i8 is 11 (the number of bits set in 8*255). If we're using a more precise bound like this, we could also handle other forms of uaddlv.

Thanks for comments.

Can we also handle v16i8?

Yep, let me add the type.

Is it worth trying to use a more precise bound, instead of just the approximate power of two? The largest number of bits that can actually be set for a v8i8 is 11 (the number of bits set in 8*255). If we're using a more precise bound like this, we could also handle other forms of uaddlv.

Yep, I think it would be good to make more bits zero. Let me update the bound.

jaykang10 updated this revision to Diff 554234.Aug 29 2023, 3:02 AM

Following @efriedma's comment, changed the bound to 11 and supported v16i8 type.

@efriedma I have updated the patch. Please check it.

jaykang10 updated this revision to Diff 554251.Aug 29 2023, 3:50 AM

Fixed wrong bound for v16i8.

efriedma accepted this revision.Aug 29 2023, 10:11 AM

It looks like you uploaded a diff on top of the first patch instead of the whole patch? Please make sure you commit the right thing.

LGTM with that fixed

Sorry, I pushed the first patch last week without mentioning this review accidentally...
Let me push this patch on the top of the first one.
Thanks for comments.

This revision was landed with ongoing or failed builds.Aug 30 2023, 12:22 AM
This revision was automatically updated to reflect the committed changes.