This is an archive of the discontinued LLVM Phabricator instance.

[AArch64] Insert/Extract of bitcast patterns
ClosedPublic

Authored by dmgreen on Sep 16 2022, 1:07 AM.

Details

Summary

This adds some quick tablegen patterns for vector_insert(bitcast(..)) and bitcast(vector_extract(..)), allowing us to avoid a round-trip through GPRs.

Diff Detail

Event Timeline

dmgreen created this revision.Sep 16 2022, 1:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2022, 1:07 AM
dmgreen requested review of this revision.Sep 16 2022, 1:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptSep 16 2022, 1:07 AM
samtebbs accepted this revision.Sep 20 2022, 6:14 AM

Nice change

This revision is now accepted and ready to land.Sep 20 2022, 6:14 AM
This revision was landed with ongoing or failed builds.Sep 21 2022, 1:54 AM
This revision was automatically updated to reflect the committed changes.
alexfh added a subscriber: alexfh.Sep 27 2022, 8:29 AM

Looks like some of our code gets miscompiled on aarch64 after this change. I can't yet share an isolated test, but this is the difference in the assembly:

@@ -109,11 +109,10 @@
        ldr     x10, [sp, #48]
        stp     x8, x9, [sp, #64]
        blr     x10
-       scvtf   d0, w0
        mov     w8, #1
-       fmov    x9, d0
+       scvtf   d1, w0
        dup     v0.2d, x8
-       mov     v0.d[0], x9
+       fmov    d0, d1
 .LBB1_5:
        adrp    x1, .L.str.3
        add     x1, x1, :lo12:.L.str.3

The code misbehaves only on arm with -O2 and higher (works fine on arm with -O1 and on x86-64 with any optimization level) and I don't see any undefined behavior in it.

Yeah that certainly looks wrong. The fmov will be clearing the result of the dup. I suspect that INSERT_SUBREG without a IMPLICIT_DEF operand is what is going wrong.

I'll revert the patch.

I've only removed the zero-element patterns in rG401481daac90. I believe those were the ones going wrong. Did that fix the issues that you are seeing? If not we can revert the rest whilst we investigate.

I've only removed the zero-element patterns in rG401481daac90. I believe those were the ones going wrong. Did that fix the issues that you are seeing? If not we can revert the rest whilst we investigate.

Thank you Dave! rG401481daac90 fixes this particular problem. I'm now testing it completely to ensure it doesn't break anything else.

Thanks for the confirmation. Let us know.

Thanks for the confirmation. Let us know.

No new issues discovered after this patch. Thanks again for the quick resolution!