This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Add pre-emit pass to make more instructions compressible
ClosedPublic

Authored by lewis-revill on Nov 25 2020, 8:03 AM.

Details

Summary

When optimizing for size, this pass searches for instructions that are
prevented from being compressed by one of the following:

  1. The use of a single uncompressed register.
  2. A base register + offset where the offset is too large to be compressed and the base register may or may not already be compressed.

In the first case, if there is a compressed register available, then the
uncompressed register is copied to the compressed register and its uses
replaced. This is only done if there are enough uses that code size
would be improved.

In the second case, if a compressed register is available, then the
original base register is copied and adjusted such that:

new_base_register = base_register + adjustment
base_register + large_offset = new_base_register + small_offset

and the uses of the base register are replaced with the new base
register. Again this is only done if there are enough uses for code size
to be improved.

This pass was authored by Lewis Revill, with large offset optimization
added by Craig Blackmore.

Diff Detail

Event Timeline

craigblackmore created this revision.Nov 25 2020, 8:03 AM
craigblackmore requested review of this revision.Nov 25 2020, 8:03 AM
jrtc27 added inline comments.Nov 25 2020, 8:08 AM
llvm/test/CodeGen/RISCV/compress-instrs.ll
1 ↗(On Diff #307605)

This should be a MIR test running just the new pass so it's less fragile (and can also test more interesting cases where it's hard to coerce CodeGen into creating uncompressible instructions)

craigblackmore edited the summary of this revision. (Show Details)Nov 25 2020, 9:24 AM
frasercrmck added inline comments.Nov 25 2020, 9:28 AM
llvm/lib/Target/RISCV/RISCVCompressInstrs.cpp
215

Is this a strict enough check for "definition"? What about super- and sub-registers or clobbers? Would MI.modifiesRegister work better (and be simpler)?

263

Same question about defined...ness here.

279

Nit: redundant parens.

khchen added a subscriber: khchen.Nov 25 2020, 5:24 PM

Updated to address feedback from @jrtc27 and @frasercrmck and fix lint warnings.

craigblackmore marked 4 inline comments as done.Dec 11 2020, 9:46 AM
craigblackmore added inline comments.
llvm/lib/Target/RISCV/RISCVCompressInstrs.cpp
215

Thanks, MI.modifiesRegister is better and makes this check much simpler.

263

At the moment isDef is sufficient as we are only expecting LW or SW. Given we are post regalloc, that means only physical integer registers (which do not have subregisters). I have added an assert so that it is clear we are only expecting LW or SW and a comment to say that the definedness check may need to be strengthened if the pass is extended to support more instructions.

llvm/test/CodeGen/RISCV/compress-instrs.ll
1 ↗(On Diff #307605)

Thanks, I have replaced this with a MIR test.

craigblackmore marked 3 inline comments as done and 2 inline comments as done.Dec 11 2020, 9:52 AM
jrtc27 added inline comments.Dec 11 2020, 9:57 AM
llvm/test/CodeGen/RISCV/compress-instrs.mir
2

Please indent continuation lines with an additional 4 spaces.

jrtc27 added inline comments.Dec 11 2020, 9:59 AM
llvm/test/CodeGen/RISCV/compress-instrs.mir
2

Or maybe the style is just 2. For | it seems to be generally 2, but for arguments that wrap it seems to be any of 2, 3 or 4 depending on where you look.

Indented continuation lines in the MIR test. I couldn't see a consistent style, so opted for an indent of 2.

craigblackmore marked 2 inline comments as done.Dec 17 2020, 2:32 AM
craig.topper added inline comments.
llvm/lib/Target/RISCV/CMakeLists.txt
25

I think these were in alphabetical order

llvm/lib/Target/RISCV/RISCVCompressInstrs.cpp
202

Use SmallVectorImpl so you don't need to repeat the 8 from the caller.

lenary resigned from this revision.Jan 14 2021, 9:43 AM

Addressed comments from @craig.topper.

craigblackmore marked 2 inline comments as done.Jan 18 2021, 8:59 AM

Some comments about current hard-coded assumptions. Whilst only supporting LW/SW at the moment is fine, I don't think it's a good idea to have such assumptions embedded in the design; instead it should be approximately general enough that other widths and types can be added at a later date without overhauling its core (i.e. just adding some more cases). Though at that point you might as well just support LD/SD and FLW/FSW/FLD/FSD given it'd then be trivial...

llvm/lib/Target/RISCV/RISCVCompressInstrs.cpp
100

Compute this based on the size of the register being loaded/stored, as this approach does not generalise.

105

You'll need the type here.

142

Ditto; compute this based on the size of the register.

190

NoRegister

245

You need the type of the register (int or float) in order to support compressing floating-point loads/stores (and so we can support capabilities in our downstream CHERI fork, both for the base and for the source/destination).

Removed hard-coded assumptions noted by @jrtc27 and rebased. Thank you for the review and apologies for the delay in updating the patch.

lenary removed a subscriber: lenary.Oct 15 2021, 2:00 AM
asb accepted this revision.Nov 6 2021, 6:54 AM

This looks good to me - many thanks for the contribution.

@jrtc27 - are you happy your comments were all addressed?

This revision is now accepted and ready to land.Nov 6 2021, 6:54 AM
jrtc27 added a comment.Nov 6 2021, 9:03 AM

I don't understand why this is still limited to just LW and SW; you have all the bits for dealing with the differing immediate shifts already so it should be ~no effort to add LD and SD on top, and floating-point variants are just supporting additional register classes for the source and destination, some of which you again already have code for.

llvm/lib/Target/RISCV/RISCVCompressInstrs.cpp
109

Comment says bytes, this is bits

125

This is just 31 * width_in_bytes (or 31 << log2(width_in_bytes))

134

If this returned log2(bytes) (or you could compute it, but every user of the function is happy with log2(bytes)) then this is just

return isUIntN(6 + log2(bytes), Offset) && (Offset % (UINT64_C(1) << log2(bytes)) == 0);

(inlining a non-templated version of isShiftedUInt, since we don't seem to have an isShiftedUIntN in MathExtras.h, though perhaps it'd be better to just add it). Ditto for compressibleOffset just with 5 not 6.

164

You don't need the parens

215

There's a lot of duplicated code between loads and stores

craigblackmore edited the summary of this revision. (Show Details)

Add support for LD/SD/FLW/FLD/FSW/FSD and address other inline comments from @jrtc27.

jrtc27 added inline comments.Nov 22 2021, 4:40 AM
llvm/lib/Target/RISCV/RISCVCompressInstrs.cpp
182

Stray semicolon

190

Can also be scaled by 8 for double-sized quantities (and 16 for us downstream). The scaling isn't really relevant here, just that it has a larger immediate (and no source/destination register restriction, as you already note).

195

Not rs2 in the case of loads, so I guess SrcDest or something?

203

I think this would be more helpful if it then went on to say what it actually did for stores

206

How often is storing an address to itself at offset 0 actually hit?..

207

I can't obviously see tests that exercise this?

llvm/test/CodeGen/RISCV/compress-instrs-i32-f-d.mir
1 ↗(On Diff #388866)

These will need +f and +d given you use F[LS][WD]

32 ↗(On Diff #388866)

Why null for all of these? That seems like a bad idea...

80 ↗(On Diff #388866)

This would be better as an offset to a real pointer, not a horrible inttoptr of a constant

jrtc27 added inline comments.Nov 22 2021, 4:48 AM
llvm/lib/Target/RISCV/RISCVCompressInstrs.cpp
2

File name is also misleading. Makes it sound like where instruction compression happens, but it's not, it's a transformation to make the instructions more amenable to compression, at the expense (in most cases) of instruction count.

220

You can get the MBB from the MI, otherwise you risk passing the wrong MBB

298

What if you hit a def of the new register?

348

What about when Imm is 0 and all uses can be updated?

356

Is it ever possible for this to return false on anything other than the final MI? Shouldn't analyzeCompressibleUses have stopped on such an instruction?

craig.topper added inline comments.Nov 22 2021, 4:30 PM
llvm/lib/Target/RISCV/RISCVCompressInstrs.cpp
78

This is also misleading like the filename.

101

There's little reason for this to return uint8_t. Just use unsigned.

140

Why do we need to compressibleOffset? Wouldn't the Offset & ~compressedLDSTOffsetMask(Opcode) return 0 for the same cases?

147

Function name should start with a verb action so isCompressedReg

154

isCompressibleLoad

160

isCompressibleStore

247

There should be a comma after instruction.

248

Comma after compressed

Thank you for the reviews. I have updated based on the feedback and rebased.

Herald added a project: Restricted Project. · View Herald TranscriptMar 4 2022, 11:41 AM
jrtc27 added inline comments.Mar 4 2022, 11:56 AM
llvm/lib/Target/RISCV/RISCVMakeCompressible.cpp
146 ↗(On Diff #413085)

You don't want to be transforming FLW on RV64, that's reused for the LD encodings (ditto stores)

164 ↗(On Diff #413085)

Weird spacing

llvm/test/CodeGen/RISCV/make-compressible-i32-f-d.mir
1 ↗(On Diff #413085)

The names for these are a bit dodgy IMO in that they encode far too much about the specific instruction sets used, so changing that requires changing the name. Maybe make-compressible.mir and the RV64-only one to be make-compressible-rv64.mir? Or split fp out from i32 into its own file?

42 ↗(On Diff #413085)

Please don't, ptrtoint is a terrible thing... store %p itself to a bitcasted %p

craigblackmore marked 19 inline comments as done.Mar 4 2022, 11:57 AM
craigblackmore added inline comments.
llvm/lib/Target/RISCV/RISCVCompressInstrs.cpp
207

I have added tests for this now.

298

I have added a comment about this.

// The new register was scavenged for the range of instructions that are
// being updated, therefore it should not be defined within this range
// except possibly in the final instruction.
348

I have added a TODO to possibly update all uses in the future.

356

I have removed the check of the return value and removed the bool from updateOperands since it is no longer used.

craigblackmore updated this revision to Diff 414422.EditedMar 10 2022, 10:24 AM
craigblackmore marked an inline comment as done.

Address latest review.

  • No longer transforming FLW/FSW on 64-bit.
  • Fixed weird spacing.
  • Renamed test files.
  • Used bitcast instead of ptrtoint.
asb added a comment.Mar 17 2022, 7:47 AM

Any thoughts from anyone on whether this should be turned on at Oz only? It can certainly increase the number of instructions executed.

Enable at -Oz instead of -Os.

Do you have an example of a benchmark or some other codebase where this pass makes a difference? I tested this in 3 different places and there were no changes at all.

I tested this in 3 different places and there were no changes at all.

Actually, in OpenTitan I see up to 0.5% code size gains (60 bytes) in the best case.

The pass saves 456 bytes (0.76%) on a proprietary application.

If no one has additional concerns I think this could be merged.
Thanks for the info regarding the code size gains.

asb added a comment.EditedApr 14 2022, 5:11 AM

@craigblackmore: could you please take a look at compiling complex-2.c from the GCC torture suite with this patch. I'm getting incorrect output for RV32IMAFDC at Oz:

./output_rv32imafdc_ilp32_Oz/complex-2.s:16:7: error: invalid operand for instruction
        addi    fa0, ft0, 0
                ^
./output_rv32imafdc_ilp32_Oz/complex-2.s:18:7: error: invalid operand for instruction
        addi    fa1, ft1, 0
                ^
lewis-revill commandeered this revision.Apr 25 2022, 5:13 AM
lewis-revill added a reviewer: craigblackmore.

Rebased including an update from using isShiftedUIntN to the templated replacement isShiftedUInt<N, S>. Addressed the issue of inserting incorrect instructions to copy FP registers to compressible FP registers by using FSGNJ_S and FSGNJ_D instead. We note that we can't have an additional offset to adjust the resulting register by if it is an FP register.

reames added a subscriber: reames.Apr 26 2022, 7:41 AM

Drive by review. I'm mostly using this as an exercise to get familiar with the code and the target, so please don't consider any of my comments blocking.

llvm/lib/Target/RISCV/RISCVMakeCompressible.cpp
100 ↗(On Diff #424881)

I would expect us to have a means for querying the width of a load/store somewhere, and for this to simply be the log of an existing utility. I haven't dug around, but are you sure that doesn't exist?

126 ↗(On Diff #424881)

Don't we have a means already for computing max offset representable in the addressing of an instruction? I'd expect us to,

305 ↗(On Diff #424881)

It's not clear me to me why the last instruction has to be a load as this assert would seem to imply. Could we instead directly assert this is the last instruction somehow?

320 ↗(On Diff #424881)

While this is a size opt, I'd expect improving code density to outweigh the effect of an extra register move and/or add. We should measure, but this might be useful even for O2.

343 ↗(On Diff #424881)

The algorithm here appears to be O(N^2). Can we do better here?

Drive by review. I'm mostly using this as an exercise to get familiar with the code and the target, so please don't consider any of my comments blocking.

Thanks very much for the review

llvm/lib/Target/RISCV/RISCVMakeCompressible.cpp
100 ↗(On Diff #424881)

I've had a brief look, and it seems to me the only existing way would be to use MI.memoperands_begin()->getSize(). Doesn't seem to be anything simple in TargetInstrInfo.h that we can use/override either to get this functionality.

126 ↗(On Diff #424881)

No, every place which would need to check this just repeats the same logic since it's so simple.

305 ↗(On Diff #424881)

We only look at loads and stores for compressibility, and as such if we see a compressible instruction which also defines the register we know it is a compressible load. We also know that since we stop when we see the next definition of the register we are looking at, but only after we've checked the defining instruction for compressibility. So the only way we are looking at a definition is if it's the final instruction.

320 ↗(On Diff #424881)

Until recently this was enabled at -Os but changed to -Oz due to the increase in number of instructions executed. I suggest keeping at -Oz and changing this later if there is a motivation for it.

343 ↗(On Diff #424881)

Hmm, I guess there'd be a way to accumulate the ranges that we're interested in as we go through in one iteration - though I think this would be far more difficult to implement and you might end up with issues to do with looking at instructions which are yet to be updated and making incorrect conclusions (I don't see any cause for that at a first glance but it's always possible).

The pass saves 456 bytes (0.76%) on a proprietary application.

The pass saves 160 bytes (0.101%) on a proprietary application, from 157974 bytes-->157814 bytes.

asb added a comment.May 17 2022, 8:04 AM

At the last sync-up call, @reames I believe said he had no further blocking comments (thanks for the review on this Philip). I think we're good to land this Lewis if you're happy with the current code.

Rebased prior to commit

This revision was landed with ongoing or failed builds.May 25 2022, 1:27 AM
This revision was automatically updated to reflect the committed changes.
iiiyours removed a subscriber: iiiyours.
This comment was removed by iiiyours.