Change AVX512 braodcastsd/ss patterns interaction with spilling . New implementation take a scalar register and generate a vector without COPY_TO_REGCLASS (turn it into a VR128 register ) .The issue is that during register allocation we may spill a scalar value using 128-bit loads and stores, wasting cache bandwidth.
example
declare void @func_f32(float) define <8 x float> @_256_broadcast_ss_spill(float %x) { %a = fadd float %x, %x call void @func_f32(float %a) %b = insertelement <8 x float> undef, float %a, i32 0 %c = shufflevector <8 x float> %b, <8 x float> undef, <8 x i32> zeroinitializer ret <8 x float> %c }
new implementation
vaddss %xmm0, %xmm0, %xmm0 vmovss %xmm0, 4(%rsp) # 4-byte Folded Spill callq func_f32 vbroadcastss 4(%rsp), %ymm0 # 4-byte Folded Reload popq %rax retq
old implementation
vaddss %xmm0, %xmm0, %xmm0 vmovaps %xmm0, (%rsp) # 16-byte Spill callq func_f32 vbroadcastss (%rsp), %ymm0 # 16-byte Folded Reload addq $24, %rsp retq