This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Optimize SEW=64 shifts by splat on RV32.
ClosedPublic

Authored by craig.topper on May 14 2021, 12:02 PM.

Details

Summary

SEW=64 shifts only uses the log2(64) bits of shift amount. If we're
splatting a 64 bit value in 2 parts, we can avoid splatting the
upper bits and just let the low bits be sign extended. They won't
be read anyway.

For the purposes of SelectionDAG semantics of the generic ISD opcodes,
if hi was non-zero or bit 31 of the low is 1, the shift was already
undefined so it should be ok to replace high with sign extend of low.

In order do be able to find the split i64 value before it becomes
a stack operation, I added a new ISD opcode that will be expanded
to the stack spill in PreprocessISelDAG. This new node is conceptually
similar to BuildPairF64, but I expanded earlier so that we could
go through regular isel to get the right VLSE opcode for the LMUL.
BuildPairF64 is expanded in a CustomInserter.

Diff Detail

Event Timeline

craig.topper created this revision.May 14 2021, 12:02 PM
craig.topper requested review of this revision.May 14 2021, 12:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptMay 14 2021, 12:02 PM
Herald added a subscriber: MaskRay. · View Herald Transcript

I probably don't have the full picture here, but I was wondering if there was a generic "simplify demanded bits" we could hook this node into, since presumably LLVM already knows that only those bits of the shift are used. Does SimplifyDemandedBitsForTargetNode not apply here?

I probably don't have the full picture here, but I was wondering if there was a generic "simplify demanded bits" we could hook this node into, since presumably LLVM already knows that only those bits of the shift are used. Does SimplifyDemandedBitsForTargetNode not apply here?

I think for generic shift, nodes llvm considers all bits of the shift amount to be demanded. It's not generally safe because not all targets have shift instructions that modulo the shift amount.

frasercrmck accepted this revision.May 26 2021, 9:30 AM

I probably don't have the full picture here, but I was wondering if there was a generic "simplify demanded bits" we could hook this node into, since presumably LLVM already knows that only those bits of the shift are used. Does SimplifyDemandedBitsForTargetNode not apply here?

I think for generic shift, nodes llvm considers all bits of the shift amount to be demanded. It's not generally safe because not all targets have shift instructions that modulo the shift amount.

Interesting, that makes sense. Thanks.

Then this is probably the best way of doing it. LGTM.

This revision is now accepted and ready to land.May 26 2021, 9:30 AM
This revision was automatically updated to reflect the committed changes.