This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Teach MatInt to use (ADD_UW X, (SLLI X, 32)) to materialize some constants.
ClosedPublic

Authored by craig.topper on Aug 31 2023, 12:12 AM.

Details

Summary

If the high and low 32 bits are the same, we try to use
(ADD X, (SLLI X, 32)) but that only works if bit 31 is clear since
the low 32 bits will be sign extended.

If we have Zba we can use add.uw to zero the sign extended bits.

Diff Detail

Event Timeline

craig.topper created this revision.Aug 31 2023, 12:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2023, 12:12 AM
craig.topper requested review of this revision.Aug 31 2023, 12:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 31 2023, 12:12 AM
Herald added subscribers: eopXD, MaskRay. · View Herald Transcript
wangpc accepted this revision.Aug 31 2023, 12:56 AM

It seems reasonable to me.
LGTM.

This revision is now accepted and ready to land.Aug 31 2023, 12:56 AM
reames accepted this revision.Aug 31 2023, 10:26 AM

LGTM.

Mostly for completeness sake, here's how I convinced myself this is correct.

  • If we could generate a unsigned 32 constant in the 64 bit register, we could do OR((shl C, 32), C). The ADD and OR are equivalent in this case due to no common set bits.
  • Generating a zext(i32) is hard, but we can generate a sext(i32) and then truncate. We then have to put the zext/trunc somewhere.
  • add.uw applies a trunc/zext operation to one input.

A couple of thoughts off that.

  • It would be super handy to have a sh32add.uw. To bad we don't.
  • shl.uw allows us to produce a 32 bit contiguous constant at any point in the i64.
  • sh3add.uw allows us to produce a 35 bit constant or a sign extended 35 bit constant in at most four instructions.

LGTM.

Mostly for completeness sake, here's how I convinced myself this is correct.

  • If we could generate a unsigned 32 constant in the 64 bit register, we could do OR((shl C, 32), C). The ADD and OR are equivalent in this case due to no common set bits.
  • Generating a zext(i32) is hard, but we can generate a sext(i32) and then truncate. We then have to put the zext/trunc somewhere.
  • add.uw applies a trunc/zext operation to one input.

A couple of thoughts off that.

  • It would be super handy to have a sh32add.uw. To bad we don't.
  • shl.uw allows us to produce a 32 bit contiguous constant at any point in the i64.
  • sh3add.uw allows us to produce a 35 bit constant or a sign extended 35 bit constant in at most four instructions.

I think sh32add.uw is equivalent to the PACK instruction from the old Zbp spec which took the lower 32 bits from two registers and concatenated them.
Did you mean slli.uw instead of shl.uw?

LGTM.

Mostly for completeness sake, here's how I convinced myself this is correct.

  • If we could generate a unsigned 32 constant in the 64 bit register, we could do OR((shl C, 32), C). The ADD and OR are equivalent in this case due to no common set bits.
  • Generating a zext(i32) is hard, but we can generate a sext(i32) and then truncate. We then have to put the zext/trunc somewhere.
  • add.uw applies a trunc/zext operation to one input.

A couple of thoughts off that.

  • It would be super handy to have a sh32add.uw. To bad we don't.
  • shl.uw allows us to produce a 32 bit contiguous constant at any point in the i64.
  • sh3add.uw allows us to produce a 35 bit constant or a sign extended 35 bit constant in at most four instructions.

I think sh32add.uw is equivalent to the PACK instruction from the old Zbp spec which took the lower 32 bits from two registers and concatenated them.

Agreed. I wish we had that instruction, but unfortunately we don't.

Did you mean slli.uw instead of shl.uw?

Yeah. I swapped the llvm IR name and the instruction name.

This revision was landed with ongoing or failed builds.Aug 31 2023, 8:25 PM
This revision was automatically updated to reflect the committed changes.