Add a combiner helper that replaces G_UNMERGE where all the destination lanes
are dead except the first one with a G_TRUNC.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-llvm.amdgcn.image.store.2d.d16.ll | ||
---|---|---|
164 | @arsenm At first glance all the changes in AMDGPU seems fine but this one. Looking at when the transformation kicks in, the input is: %16:_(<6 x s16>) = G_CONCAT_VECTORS %13:_(<2 x s16>), %14:_(<2 x s16>), %15:_(<2 x s16>) %3:_(<3 x s16>), %17:_(<3 x s16>) = G_UNMERGE_VALUES %16:_(<6 x s16>) G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.2d), %3:_(<3 x s16>), 7, %1:_(s32), %2:_(s32), %0:_(<8 x s32>), 0, 0 :: (dereferenceable store 6 into custom "TargetCustom8", align 8) S_ENDPGM 0 And the output is: %16:_(<6 x s16>) = G_CONCAT_VECTORS %13:_(<2 x s16>), %14:_(<2 x s16>), %15:_(<2 x s16>) %19:_(s96) = G_BITCAST %16:_(<6 x s16>) %20:_(s48) = G_TRUNC %19:_(s96) %3:_(<3 x s16>) = G_BITCAST %20:_(s48) G_INTRINSIC_W_SIDE_EFFECTS intrinsic(@llvm.amdgcn.image.store.2d), %3:_(<3 x s16>), 7, %1:_(s32), %2:_(s32), %0:_(<8 x s32>), 0, 0 :: (dereferenceable store 6 into custom "TargetCustom8", align 8) S_ENDPGM 0 So far so good. Then after the legalizer it is when we have the craziness: %16:_(<6 x s16>) = G_CONCAT_VECTORS %13:_(<2 x s16>), %14:_(<2 x s16>), %15:_(<2 x s16>) %19:_(s96) = G_BITCAST %16:_(<6 x s16>) %28:_(s32), %29:_(s32), %30:_(s32) = G_UNMERGE_VALUES %19:_(s96) %35:_(s32) = G_CONSTANT i32 16 %36:_(s32) = G_LSHR %28:_, %35:_(s32) %37:_(s32) = G_LSHR %29:_, %35:_(s32) %46:_(s32) = G_CONSTANT i32 65535 %49:_(s32) = COPY %28:_(s32) %40:_(s32) = G_AND %49:_, %46:_ %48:_(s32) = COPY %36:_(s32) %41:_(s32) = G_AND %48:_, %46:_ %42:_(s32) = G_SHL %41:_, %35:_(s32) %38:_(s32) = G_OR %40:_, %42:_ %32:_(<2 x s16>) = G_BITCAST %38:_(s32) %47:_(s32) = COPY %29:_(s32) %43:_(s32) = G_AND %47:_, %46:_ %44:_(s32) = G_CONSTANT i32 0 %45:_(s32) = G_SHL %44:_, %35:_(s32) %39:_(s32) = G_OR %43:_, %45:_ %33:_(<2 x s16>) = G_BITCAST %39:_(s32) %34:_(<6 x s16>) = G_CONCAT_VECTORS %32:_(<2 x s16>), %33:_(<2 x s16>), %15:_(<2 x s16>) %3:_(<3 x s16>) = G_EXTRACT %34:_(<6 x s16>), 0 %21:_(<2 x s32>) = G_BUILD_VECTOR %1:_(s32), %2:_(s32) G_AMDGPU_INTRIN_IMAGE_STORE intrinsic(@llvm.amdgcn.image.store.2d), %3:_(<3 x s16>), 7, %21:_(<2 x s32>), $noreg, %0:_(<8 x s32>), 0, 0, 0 :: (dereferenceable store 6 into custom "TargetCustom8", align 8) S_ENDPGM 0 Do you think the AMDGPU target is missing something or should I disable the combine for vector types, at least for now? |
llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.s.buffer.load.ll | ||
---|---|---|
34 | FYI, this change is just that update_mir now doesn't want to reuse prefixes for RUN lines :(. |
llvm/test/CodeGen/AMDGPU/GlobalISel/legalize-llvm.amdgcn.image.store.2d.d16.ll | ||
---|---|---|
164 | This is fine. <3 x s16> is problematic and I'm working on eliminating all of them now. |
llvm/lib/CodeGen/GlobalISel/CombinerHelper.cpp | ||
---|---|---|
1683 | can use auto and avoid .getReg(0) |
can use auto and avoid .getReg(0)