Details
Diff Detail
Event Timeline
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
8603–8604 | I don't see why we would bother disabling specifically the target combines at -O0, but not all of them. I thought combines were already partially skipped at -O0? |
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
8603–8604 | I do not see it. Moreover, I see that combiner works regardless of optimization level. |
Sorry for taking so long to look at this. I am running some OCL conformance tests to get an idea of what impact this will have in terms of regressions, but if we believe this is the correct thing to do I don't think that should have any impact on when it is merged anyway.
lib/Target/AMDGPU/SIISelLowering.cpp | ||
---|---|---|
8704–8721 | The only regression I see is for something of the form: target triple = "amdgcn-amd-amdhsa" ; Function Attrs: noinline optnone define void @spam() #0 { %tmp = load <3 x i16>, <3 x i16> addrspace(5)* undef, align 8 %tmp1 = insertelement <3 x i16> %tmp, i16 0, i64 0 store <3 x i16> %tmp1, <3 x i16> addrspace(5)* undef, align 8 ret void } attributes #0 = { noinline optnone "target-cpu"="fiji" } For which we produce: Optimized legalized selection DAG: %bb.0 'spam:' SelectionDAG has 19 nodes: t0: ch = EntryToken t43: v2i16 = scalar_to_vector Constant:i16<0> t57: i32 = bitcast t43 t68: ch = store<(store 2 into `<3 x i16> addrspace(5)* undef`, align 8, addrspace 5), trunc to i16> t40:1, t57, undef:i32, undef:i32 t41: v2i32 = BUILD_VECTOR t40, t40 t63: i64 = bitcast t41 t36: i64 = srl t63, Constant:i32<32> t37: i16 = truncate t36 t21: ch = store<(store 2 into `<3 x i16> addrspace(5)* undef` + 4, align 4, addrspace 5)> t40:1, t37, undef:i32, undef:i32 t22: ch = TokenFactor t68, t21 t9: i64,ch = CopyFromReg t0, Register:i64 %0 t11: ch,glue = CopyToReg t22, Register:i64 $sgpr30_sgpr31, t9 t40: i32,ch = load<(load 4 from `<3 x i16> addrspace(5)* undef`, align 8, addrspace 5)> t0, undef:i32, undef:i32 t12: ch = RET_FLAG t11, Register:i64 $sgpr30_sgpr31, t11:1 And then fail to select t43: v2i16 = scalar_to_vector Constant:i16<0> |
LGTM; if there are other regressions we can address them as they come up, but I don't see any major ones now.
I don't see why we would bother disabling specifically the target combines at -O0, but not all of them. I thought combines were already partially skipped at -O0?