This is an archive of the discontinued LLVM Phabricator instance.

[AMDGPU] SDWA Peephole: improve search for immediates in SDWA patterns
ClosedPublic

Authored by SamWot on Mar 29 2017, 2:10 AM.

Details

Summary

Previously compiler often extracted common immediates into specific register, e.g.:

%vreg0 = S_MOV_B32 0xff;
%vreg2 = V_AND_B32_e32 %vreg0, %vreg1
%vreg4 = V_AND_B32_e32 %vreg0, %vreg3

Because of this SDWA peephole failed to find SDWA convertible pattern. E.g. in previous example this could be converted into 2 SDWA src operands:

SDWA src: %vreg2 src_sel:BYTE_0
SDWA src: %vreg4 src_sel:BYTE_0

With this change peephole check if operand is either immediate or register that is copy of immediate.

Event Timeline

SamWot created this revision.Mar 29 2017, 2:10 AM
SamWot edited the summary of this revision. (Show Details)Mar 29 2017, 2:11 AM
SamWot added a subscriber: Restricted Project.
vpykhtin edited edge metadata.Mar 29 2017, 4:35 AM

Mostly good.

lib/Target/AMDGPU/SIInstrInfo.cpp
1501

is this check reliable enough?

lib/Target/AMDGPU/SIPeepholeSDWA.cpp
380

you can use llvm::Optional<int64_t> as return value and eliminate out arg.

I would rename this function as it is a bit misleading, something like foldToImm

390

when a def operand cannot be the same reg?

394

Is isSameBB really needed?

SamWot updated this revision to Diff 93358.Mar 29 2017, 5:34 AM

Changed isImm() to foldToImm() that return Optional<int64_t>

SamWot added inline comments.Mar 29 2017, 5:36 AM
lib/Target/AMDGPU/SIInstrInfo.cpp
1501

This was static function in SIFoldOperands.cpp

lib/Target/AMDGPU/SIPeepholeSDWA.cpp
390

When it is subreg or super reg of this register

394

Not really, but since this pass is still local to basicblock I think it is better.

vpykhtin accepted this revision.Mar 29 2017, 5:37 AM

LGTM.

This revision is now accepted and ready to land.Mar 29 2017, 5:37 AM
SamWot closed this revision.Mar 31 2017, 5:44 AM

Submitted in r299202