Previous pattern was omitting ops in sequence which just increases the
latency (to 3c, same as imul!) i.e:
(add/sub (add/sub (shl x, N), x), x)
Better is to compute 2x indepedently so x << N for better ULP i.e:
(add/sub (shl x, N), (add x, x))
Paths
| Differential D141113
[X86] Improve mul x, 2^N +/- 2 pattern by making the +/- 2x compute independently to x << N ClosedPublic Authored by goldstein.w.n on Jan 6 2023, 12:54 AM.
Details Summary Previous pattern was omitting ops in sequence which just increases the (add/sub (add/sub (shl x, N), x), x) Better is to compute 2x indepedently so x << N for better ULP i.e:
Diff Detail
Event Timeline
Comment Actions LGTM - cheers
This revision is now accepted and ready to land.Jan 6 2023, 3:10 AM
goldstein.w.n retitled this revision from Improve mul 2^N +/- 2 pattern to [X86] Improve mul x, 2^N +/- 2 pattern by making the +/- 2x compute independently to x << N.Jan 6 2023, 9:12 AM Comment Actions
Done I think. Comment Actions LGTM.
Closed by commit rG4196ca3278f7: [X86] Improve mul x, 2^N +/- 2 pattern by making the +/- 2x compute… (authored by goldstein.w.n, committed by pengfei). · Explain WhyJan 12 2023, 8:54 PM This revision was automatically updated to reflect the committed changes.
Revision Contents
Diff 486899 llvm/lib/Target/X86/X86ISelLowering.cpp
llvm/test/CodeGen/X86/mul-constant-i16.ll
llvm/test/CodeGen/X86/mul-constant-i32.ll
llvm/test/CodeGen/X86/mul-constant-i64.ll
llvm/test/CodeGen/X86/mul-constant-i8.ll
llvm/test/CodeGen/X86/mul-constant-result.ll
|
Regression?