Solves #59869 (https://github.com/llvm/llvm-project/issues/59869)
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/AMDGPU/saddo.ll | ||
---|---|---|
36 | Not sure if it's a regression here. Yes there's one more instruction, but we're using more scalar instructions so isn't it beneficial in the end? | |
llvm/test/CodeGen/AMDGPU/usubo.ll | ||
19 | This looks a bit like a regression but I'm not sure how to address it. The pattern comes from zext (setcc). |
llvm/test/CodeGen/AMDGPU/saddo.ll | ||
---|---|---|
32 | I'm not sure this is correct. The old code treated s[4:5] like a divergent boolean, with a bit for each *active* lane. The new code assumes the boolean value is in bit 0 - but will that work if lane 0 is not active? |
llvm/test/CodeGen/AMDGPU/saddo.ll | ||
---|---|---|
34 | This looks like we could avoid the v_cndmask_b32_e64 if we used v_addc with the carry input coming from s[4:5] |
llvm/test/CodeGen/AMDGPU/saddo.ll | ||
---|---|---|
36 | It's one more instruction because of the copy to VGPR for the store. We probably should have a rematerialize-as-VALU optimization to handle this kind of case. | |
llvm/test/CodeGen/AMDGPU/usubo.ll | ||
19 | Before this was a 32-bit select, so I assume this was a zext to i32 so I don't see why this zext to i64 change matters. What was the DAG here? |
llvm/test/CodeGen/AMDGPU/saddo.ll | ||
---|---|---|
32 | Very good question; I have a bit of trouble following what V_CNDMASK does exactly in this case. This function doesn't seem to select with global isel so I can't compare with that | |
llvm/test/CodeGen/AMDGPU/usubo.ll | ||
19 | It was a zext to i64 t41: i1 = setcc t39, t51, setugt:ch t30: i64 = zero_extend t41 t31: i64 = add t39, t30 t37: v2i32 = bitcast t31 |
llvm/test/CodeGen/AMDGPU/saddo.ll | ||
---|---|---|
32 | For each active lane, v0 get 0 if the corresponding bit of s[4:5] if 0, 1 if the corresponding bit is 1. v_cndmask v0, v1, v2, s0 is a bit like v0 = s0 ? v2 : v1 (NB v1/v2 are swapped!) where s0 holds the condition as a divergent boolean, 1 bit per lane. |
I'm not sure this is correct. The old code treated s[4:5] like a divergent boolean, with a bit for each *active* lane. The new code assumes the boolean value is in bit 0 - but will that work if lane 0 is not active?