Page MenuHomePhabricator

[AMDGPU] Fixed i64 add/sub used in lowering of i64 srem
AbandonedPublic

Authored by tpr on Mar 19 2019, 1:07 PM.

Details

Reviewers
michel.daenzer
Summary

My commit rL356399 "[AMDGPU] Asm/disasm clamp modifier on vop3 int arithmetic"
broke a case of i64 srem being lowered. Fixed.

Change-Id: Id274ae6ac3c8687a23999ea239f383b37d812fab

Diff Detail

Event Timeline

tpr created this revision.Mar 19 2019, 1:07 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2019, 1:07 PM

Testcase should be reducible . You can use the uaddo intrinsic directly

tpr added a comment.Mar 20 2019, 7:00 AM

The test is already reduced as much as I can. Removing anything in there makes the problem disappear. Constructing a new test case using llvm.uadd.with.overflow does not show the problem. Can we go with this test case?

In D59556#1436285, @tpr wrote:

The test is already reduced as much as I can. Removing anything in there makes the problem disappear. Constructing a new test case using llvm.uadd.with.overflow does not show the problem. Can we go with this test case?

I managed with this:

define amdgpu_kernel void @v_uaddo_i32(i32 addrspace(1)* %out, i1 addrspace(1)* %carryout, i32 addrspace(1)* %a.ptr, i32 addrspace(1)* %b.ptr, float %dummy.val) #0 {
  %tid = call i32 @llvm.amdgcn.workitem.id.x()
  %tid.ext = sext i32 %tid to i64
  %a.gep = getelementptr inbounds i32, i32 addrspace(1)* %a.ptr
  %b.gep = getelementptr inbounds i32, i32 addrspace(1)* %b.ptr
  %a = load volatile i32, i32 addrspace(1)* %a.gep, align 4
  %b = load volatile i32, i32 addrspace(1)* %b.gep, align 4
  %uadd0 = call { i32, i1 } @llvm.uadd.with.overflow.i32(i32 %a, i32 %b)
  %val0 = extractvalue { i32, i1 } %uadd0, 0
  %carry0 = extractvalue { i32, i1 } %uadd0, 1
  store volatile i32 %val0, i32 addrspace(1)* %out, align 4
  store i1 %carry0, i1 addrspace(1)* %carryout

  ; Force a use of an i1 0 that will be materialized in a register,
  ; which will be selected before the uaddo (so its operand is
  ; repalced with the materialized node)
  %fmas = call float @llvm.amdgcn.div.fmas.f32(float %dummy.val, float %dummy.val, float %dummy.val, i1 false)
  store volatile float %fmas, float addrspace(1)* null
  ret void
}

declare float @llvm.amdgcn.div.fmas.f32(float, float, float, i1)
declare i32 @llvm.amdgcn.workitem.id.x() #1
declare { i32, i1 } @llvm.uadd.with.overflow.i32(i32, i32) #1

attributes #0 = { nounwind }
attributes #1 = { nounwind readnone }
tpr abandoned this revision.Mar 20 2019, 1:15 PM

Thanks for the better test Matt. But I'll abandon this one in favor of Michael's improved fix D59608.