My commit rL356399 "[AMDGPU] Asm/disasm clamp modifier on vop3 int arithmetic"
broke a case of i64 srem being lowered. Fixed.
Change-Id: Id274ae6ac3c8687a23999ea239f383b37d812fab
Differential D59556
[AMDGPU] Fixed i64 add/sub used in lowering of i64 srem tpr on Mar 19 2019, 1:07 PM. Authored by
Details
My commit rL356399 "[AMDGPU] Asm/disasm clamp modifier on vop3 int arithmetic" Change-Id: Id274ae6ac3c8687a23999ea239f383b37d812fab
Diff Detail
Event TimelineComment Actions The test is already reduced as much as I can. Removing anything in there makes the problem disappear. Constructing a new test case using llvm.uadd.with.overflow does not show the problem. Can we go with this test case? Comment Actions I managed with this: define amdgpu_kernel void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %a.ptr, i32 addrspace(1)* %b.ptr, float %dummy.val) #0 { %tid = call i32 @llvm.amdgcn.workitem.id.x() %tid.ext = sext i32 %tid to i64 %a.gep = getelementptr inbounds i32, i32 addrspace(1)* %a.ptr %b.gep = getelementptr inbounds i32, i32 addrspace(1)* %b.ptr %a = load volatile i32, i32 addrspace(1)* %a.gep, align 4 %b = load volatile i32, i32 addrspace(1)* %b.gep, align 4 %uadd0 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %a, i32 %b) %val0 = extractvalue { i32, i1 } %uadd0, 0 %carry0 = extractvalue { i32, i1 } %uadd0, 1 store volatile i32 %val0, i32 addrspace(1)* %out, align 4 store i1 %carry0, i1 addrspace(1)* %carryout ; Force a use of an i1 0 that will be materialized in a register, ; which will be selected before the uaddo (so its operand is ; repalced with the materialized node) %fmas = call float @llvm.amdgcn.div.fmas.f32(float %dummy.val, float %dummy.val, float %dummy.val, i1 false) store volatile float %fmas, float addrspace(1)* null ret void } declare float @llvm.amdgcn.div.fmas.f32(float, float, float, i1) declare i32 @llvm.amdgcn.workitem.id.x() #1 declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) #1 attributes #0 = { nounwind } attributes #1 = { nounwind readnone } Comment Actions Thanks for the better test Matt. But I'll abandon this one in favor of Michael's improved fix D59608. |