Page MenuHomePhabricator

[RISCV] Fold low 12 bits into instruction during frame index elimination

Authored by reames on Nov 30 2022, 11:39 AM.



Fold the low 12 bits of an immediate offset into the offset field of the using instruction. That using instruction will be a load, store, or addi which performs an add of a signed 12-bit immediate as part of it's operation. Splitting out the low bits allows the high bits to be generated via a single LUI instead of needing an LUI/ADDI pair.

The codegen effect of this is mostly converting cases where "split addi" kicks in to using LUI + a folded offset. There are a couple of straight dynamic instruction count wins, and using a canonical LUI is probably better than a chain of SP adds if the dynamic instruction count is equal.

I'd appreciate careful review here. I'm not entirely sure this is correct without additional guards, and this is the type of bit math I have trouble reasoning about. My main concern is whether the addition of the signed add is the same as the ADDIW used on RV64 without additional special handling.

Diff Detail

Event Timeline

reames created this revision.Nov 30 2022, 11:39 AM
reames requested review of this revision.Nov 30 2022, 11:39 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2022, 11:39 AM
craig.topper added inline comments.Nov 30 2022, 11:48 AM

Can this just be

SignExtend64<32>(Val - Lo12)?

reames added inline comments.Nov 30 2022, 11:51 AM

See comment about being really bad at bitmath. I tried to follow the constant materialization code as closely as possible. :)

If you think this is correct, I'll make the change and report any differences I spot.

craig.topper added inline comments.Nov 30 2022, 2:33 PM

Thinking about it more, I'm not sure the SignExtend64<32> is correct. SignExtend<32>(Hi20 << 12) + SignExtend<12>(Lo) is not guaranteed to give back the original constant.

Val - Lo12 without a SignExtend64<32> would guarantee we'd produce the original value, but Val - Lo12 isn't guaranteed to be a simm32. Does later code require the offset to be simm32?

craig.topper added inline comments.Nov 30 2022, 2:37 PM

It would need to be (uint64_t)Val - (uint64_t)Lo12 to avoid UB.

reames updated this revision to Diff 479362.Dec 1 2022, 10:59 AM
asb accepted this revision.Dec 2 2022, 10:20 AM

This LGTM - I've read through and also run the GCC torture suite with it enabled. Anything related to frame indices is fiddly and error prone so it's not impossible there's a bug lurking - but I've failed to see one so far!

This revision is now accepted and ready to land.Dec 2 2022, 10:20 AM
This revision was landed with ongoing or failed builds.Dec 2 2022, 11:54 AM
This revision was automatically updated to reflect the committed changes.
piggynl added a subscriber: piggynl.Dec 6 2022, 6:02 AM