I believe this is the correct fix for D75506 rather than disabling all commuting. We can still commute the remaining two sources.
Details
Details
Diff Detail
Diff Detail
Event Timeline
| llvm/test/CodeGen/X86/avx512-intrinsics.ll | ||
|---|---|---|
| 5822 | Not too familiar with this code path, but we can shrink this test a bit and still crash: define <4 x float> @test_int_x86_avx512_maskz_vfmadd_ss_load0(i1 zeroext %t0, <4 x float>* nocapture readonly %t1, float %t2, float %t3) {
%t5 = load <4 x float>, <4 x float>* %t1, align 16
%t6 = extractelement <4 x float> %t5, i64 0
%t9 = tail call float @llvm.fma.f32(float %t6, float %t2, float %t3) #2
%t12 = select i1 %t0, float %t9, float 0.0
%t13 = insertelement <4 x float> %t5, float %t12, i64 0
ret <4 x float> %t13
} | |
| llvm/test/CodeGen/X86/avx512-intrinsics.ll | ||
|---|---|---|
| 5838 | (sidenote) losing the mask register from the asm comment is really bad....... | |
Comment Actions
Fixed in 6ca96765c7e6f63b45e6c311918a648ef684ea20 but I mistyped the Differential Revision line
Not too familiar with this code path, but we can shrink this test a bit and still crash:
define <4 x float> @test_int_x86_avx512_maskz_vfmadd_ss_load0(i1 zeroext %t0, <4 x float>* nocapture readonly %t1, float %t2, float %t3) { %t5 = load <4 x float>, <4 x float>* %t1, align 16 %t6 = extractelement <4 x float> %t5, i64 0 %t9 = tail call float @llvm.fma.f32(float %t6, float %t2, float %t3) #2 %t12 = select i1 %t0, float %t9, float 0.0 %t13 = insertelement <4 x float> %t5, float %t12, i64 0 ret <4 x float> %t13 }