This adds some quick tablegen patterns for vector_insert(bitcast(..)) and bitcast(vector_extract(..)), allowing us to avoid a round-trip through GPRs.
Details
Diff Detail
Unit Tests
Time | Test | |
---|---|---|
60,040 ms | x64 debian > libFuzzer.libFuzzer::fuzzer-leak.test |
Event Timeline
Looks like some of our code gets miscompiled on aarch64 after this change. I can't yet share an isolated test, but this is the difference in the assembly:
@@ -109,11 +109,10 @@ ldr x10, [sp, #48] stp x8, x9, [sp, #64] blr x10 - scvtf d0, w0 mov w8, #1 - fmov x9, d0 + scvtf d1, w0 dup v0.2d, x8 - mov v0.d[0], x9 + fmov d0, d1 .LBB1_5: adrp x1, .L.str.3 add x1, x1, :lo12:.L.str.3
The code misbehaves only on arm with -O2 and higher (works fine on arm with -O1 and on x86-64 with any optimization level) and I don't see any undefined behavior in it.
Yeah that certainly looks wrong. The fmov will be clearing the result of the dup. I suspect that INSERT_SUBREG without a IMPLICIT_DEF operand is what is going wrong.
I'll revert the patch.
I've only removed the zero-element patterns in rG401481daac90. I believe those were the ones going wrong. Did that fix the issues that you are seeing? If not we can revert the rest whilst we investigate.
Thank you Dave! rG401481daac90 fixes this particular problem. I'm now testing it completely to ensure it doesn't break anything else.