This is an archive of the discontinued LLVM Phabricator instance.

[X86] Improve mul x, 2^N +/- 2 pattern by making the +/- 2x compute independently to x << N
ClosedPublic

Authored by goldstein.w.n on Jan 6 2023, 12:54 AM.

Details

Summary

Previous pattern was omitting ops in sequence which just increases the
latency (to 3c, same as imul!) i.e:

(add/sub (add/sub (shl x, N), x), x)

Better is to compute 2x indepedently so x << N for better ULP i.e:
(add/sub (shl x, N), (add x, x))

Diff Detail

Event Timeline

goldstein.w.n created this revision.Jan 6 2023, 12:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 6 2023, 12:54 AM
goldstein.w.n requested review of this revision.Jan 6 2023, 12:54 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 6 2023, 12:54 AM
pengfei added inline comments.Jan 6 2023, 1:37 AM
llvm/test/CodeGen/X86/mul-constant-result.ll
166–175

Regression?

RKSimon accepted this revision.Jan 6 2023, 3:10 AM

LGTM - cheers

llvm/test/CodeGen/X86/mul-constant-result.ll
166–175

The increase in lines seems to be due to extra labels/cfi-directives - tbh I'd take the extra LEA if we reduce control flow instructions.

This revision is now accepted and ready to land.Jan 6 2023, 3:10 AM

Can you describe the old vs new pattern in the description

goldstein.w.n added inline comments.Jan 6 2023, 9:05 AM
llvm/test/CodeGen/X86/mul-constant-result.ll
166–175

The increase in lines seems to be due to extra labels/cfi-directives - tbh I'd take the extra LEA if we reduce control flow instructions.

166–175

Regression?

I think that the tail of some cases fold when its sub; sub so lower LOC, more jumps.
Issue?

Improved description / summary

goldstein.w.n retitled this revision from Improve mul 2^N +/- 2 pattern to [X86] Improve mul x, 2^N +/- 2 pattern by making the +/- 2x compute independently to x << N.Jan 6 2023, 9:12 AM
goldstein.w.n edited the summary of this revision. (Show Details)

Can you describe the old vs new pattern in the description

Done I think.

pengfei accepted this revision.Jan 7 2023, 4:33 AM

LGTM.

llvm/test/CodeGen/X86/mul-constant-result.ll
166–175

Look it again, although the new code have one more BB, it has one less jmp. So the new code it better.