This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Improve materialization for immediates which is almost a 32 bit splat.
ClosedPublic

Authored by Esme on Dec 12 2022, 12:28 AM.

Details

Summary

Some 64 bit constants can be materialized with fewer instructions than we currently use.
We consider a 64 bit immediate value divided into four parts, Hi16OfHi32 (bits 48...63), Lo16OfHi32 (bits 32...47), Hi16OfLo32 (bits 16...31), Lo16OfLo32 (bits 0...15). When any three parts are equal, the immediate can be treated as "almost" a splat of a 32 bit value in a 64 bit register.
For example:

define  i64 @almost_splat() {
entry:
  ; 0xCCFFCCFF0123CCFF (Hi16OfHi32 == Lo16OfHi32 ==Lo16OfLo32)
  ret i64 14771750698406366463
}

Currently we use 5 instruction to materialize the immediate:

# %bb.0:                                # %entry
	lis 3, -13057
	ori 3, 3, 52479
	rldic 3, 3, 32, 0
	oris 3, 3, 291
	ori 3, 3, 52479
	blr

To improve that we can use 3 instructions to generate the splat and use 1 instruction to modify the different part:

# %bb.0:                                # %entry
	lis 3, 291
	ori 3, 3, 52479
	rldimi 3, 3, 32, 0       // 0x0123CCFF0123CCFF is generated here
	rldimi 3, 3, 48, 0       // modify Hi16OfHi32, then we get 0xCCFFCCFF0123CCFF
	blr

Diff Detail

Event Timeline

Esme created this revision.Dec 12 2022, 12:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2022, 12:28 AM
Esme requested review of this revision.Dec 12 2022, 12:28 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 12 2022, 12:28 AM
shchenz added inline comments.Dec 12 2022, 5:51 AM
llvm/test/CodeGen/PowerPC/constants-i64.ll
378

please commit the new cases first

Esme updated this revision to Diff 482368.Dec 12 2022, 11:29 PM

Address comments.

shchenz added inline comments.Dec 23 2022, 3:08 AM
llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
1389

For compile time concern, since the 1 instruction patterns for Imm & 0xffffffff00000000 are simple, can we move the new code before line 1318 and do some simple check for 1 instruction pattern in selectI64ImmDirect?

1396

nit: there is a getI32Imm lambda at line 1314.

1412

I don't get the point here.
original input: 0xAAAABBBBAAAACCCC
splat result: 0XAAAABBBBAAAABBBB
after the ORI8: 0XAAAABBBBAAAABBBB | 0xCCCC != 0xAAAABBBBAAAACCCC ?

Seems another rldimi is needed to insert Lo16OfLo32 to the splat result.

Esme updated this revision to Diff 487248.Jan 8 2023, 5:24 PM
Esme edited the summary of this revision. (Show Details)

Addressed comments and verified the materialization results.
However, I can't find a proper instruction to turn 0xABCD ADDD ABCD ADDD into 0xABCD ABCD ABCD ADDD, ie. modify Lo16OfHi32 (bits 32...47), so I didn't handle the pattern like 0xABCD ABCD ABCD ADDD.

Esme added inline comments.Jan 8 2023, 5:35 PM
llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
1389

Sorry I didn't quite understand this comment.
Imm & 0xffffffff00000000 always takes more than 1 instruction because it is impossible to match the 1-instruction-pattern:

// 1-1) Patterns : {zeros}{15-bit valve}
//                 {ones}{15-bit valve}

// 1-2) Patterns : {zeros}{15-bit valve}{16 zeros}
//                 {ones}{15-bit valve}{16 zeros}

Addressed comments and verified the materialization results.
However, I can't find a proper instruction to turn 0xABCD ADDD ABCD ADDD into 0xABCD ABCD ABCD ADDD, ie. modify Lo16OfHi32 (bits 32...47), so I didn't handle the pattern like 0xABCD ABCD ABCD ADDD.

Thanks, the new selection seems correct. For pattern 0xABCD ABCD ABCD ADDD, yeah, I can not find a 4 instruction selection for it either.

llvm/lib/Target/PowerPC/PPCISelDAGToDAG.cpp
1389

I mean the new codes can be moved before line 1317 with some simple checks. For example if select Imm & 0xffffffff00000000 requires no less than 2 instructions and the lower 32 bit also requires 2 instructions, the new splat handling can be done first. Generating a 5 instruction pattern and then marking them as dead can be avoided even the new codes are hit.

Esme updated this revision to Diff 488268.Jan 11 2023, 9:54 AM

Addressed comments.

shchenz accepted this revision as: shchenz.Jan 11 2023, 11:56 PM

LGTM. Thanks for the improvement. Please wait for some days for other reviewers.

This revision is now accepted and ready to land.Jan 11 2023, 11:56 PM
This revision was landed with ongoing or failed builds.Jan 31 2023, 3:03 AM
This revision was automatically updated to reflect the committed changes.