This is an archive of the discontinued LLVM Phabricator instance.

[SDAG] enable binop identity constant folds for multiplies
ClosedPublic

Authored by RKSimon on Mar 19 2022, 8:49 AM.

Details

Summary

Add mul to the list of ops that we canonicalize with a select to expose an identity merge

@lebedev.ri I'm not familiar with the midpoint vector tests - your comments on rGc38831e11dc33d2a83 suggest the vector tests were just for completeness, but the simplification in codegen suggests there might be additional missing DAG folds ?

Diff Detail

Event Timeline

RKSimon created this revision.Mar 19 2022, 8:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2022, 8:49 AM
RKSimon requested review of this revision.Mar 19 2022, 8:49 AM
Herald added a project: Restricted Project. · View Herald TranscriptMar 19 2022, 8:49 AM
RKSimon added inline comments.Mar 20 2022, 2:19 PM
llvm/test/CodeGen/X86/midpoint-int-vec-128.ll
3580

@lebedev.ri This code now optimizes considerably in IR: https://llvm.godbolt.org/z/edhMY1crG

Should these test cases be updated to match?

define <16 x i8> @vec128_i8_signed_reg_mem(<16 x i8> %a1, <16 x i8>* nocapture readonly %a2_addr) {
  %a2 = load <16 x i8>, <16 x i8>* %a2_addr, align 16
  %t3 = icmp slt <16 x i8> %a2, %a1
  %1 = tail call <16 x i8> @llvm.smin.v16i8(<16 x i8> %a2, <16 x i8> %a1)
  %2 = tail call <16 x i8> @llvm.smax.v16i8(<16 x i8> %a2, <16 x i8> %a1)
  %t7 = sub <16 x i8> %2, %1
  %t8 = lshr <16 x i8> %t7, <i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1, i8 1>
  %3 = sub nsw <16 x i8> zeroinitializer, %t8
  %4 = select <16 x i1> %t3, <16 x i8> %3, <16 x i8> %t8
  %a10 = add nsw <16 x i8> %4, %a1
  ret <16 x i8> %a10
}
declare <16 x i8> @llvm.smin.v16i8(<16 x i8>, <16 x i8>)
declare <16 x i8> @llvm.smax.v16i8(<16 x i8>, <16 x i8>)

ping - any thoughts?

lebedev.ri accepted this revision.Mar 25 2022, 3:49 AM

Seems fine to me.

This revision is now accepted and ready to land.Mar 25 2022, 3:49 AM
LuoYuanke accepted this revision.EditedMar 25 2022, 3:53 AM

LGTM, too.

Thanks @LuoYuanke - can you think of any more nodes that we should add? There's plenty that can be added for completeness but I'm not sure which ones are causing actual perf issues - AND maybe? OR/XOR are less likely.

This revision was landed with ongoing or failed builds.Mar 25 2022, 4:07 AM
This revision was automatically updated to reflect the committed changes.

Thanks @LuoYuanke - can you think of any more nodes that we should add? There's plenty that can be added for completeness but I'm not sure which ones are causing actual perf issues - AND maybe? OR/XOR are less likely.

From https://godbolt.org/z/8Mv6sj8hs, all the vector operation that getSelectFoldableOperands() covers can be compiled to optimal X86 instructions. The performance issue that is captured is about float operation (fsub, fadd) and they has been fixed. Thanks, Simon.