This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] Reorder the stack frame objects.
Needs ReviewPublic

Authored by lcvon007 on Aug 23 2023, 7:47 AM.

Details

Summary

The order of stack frame objects decides the offset size relative to sp/fp, and shorter offset is more possible to make the related instructions to be compressed and use less instructions to build the offset immediate. So it can improve the code size if we reorder the stack objects using proper cost model.

The precise cost model requires further complexity, and the overall gain isn't worth it. I reuse X86's
cost model that uses the estimated density, the cost is computed by
density = ObjectNumUses / ObjectSize,
ObjectNumUses is the number of instructions using the frame object, and the difference between x86 and RISCV is that we provide the double weight for ld/st instructions because it's more possible to be compressed.
ObjectSize is the size of frame object.
CodeSize may regress in some testcases if we don't add weight for ld/st(the reason is that more compressible ld/st get too much offset to stop them being compressed), and the double weight is estimate(other maybe better in some cases).
The main order algorithm is that the frame object with higher density gets shorter offset relative to sp/fp.

Diff Detail

Event Timeline

lcvon007 created this revision.Aug 23 2023, 7:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 7:47 AM
lcvon007 requested review of this revision.Aug 23 2023, 7:47 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 23 2023, 7:47 AM
lcvon007 added a comment.EditedAug 23 2023, 7:51 AM

the optimization data for code size in Oz

the optimization data for code size in Oz(strip the symbol table)

the optimization data for code size in Oz with march rv64imafd(no c extension and strip the symbol table)


the code size of milc regress a little and I find that .text become less but .eh_frame_hdr/.eh_frame become larger.

the code size of newest order algorithm(split the stack into group with same alignment)

Can you describe the cost model in the patch description?

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1510

firstly -> first

lcvon007 retitled this revision from [RISCV] Reorder the stack objects. to [RISCV] Reorder the stack frame objects..Aug 23 2023, 7:13 PM
lcvon007 edited the summary of this revision. (Show Details)
lcvon007 updated this revision to Diff 552960.Aug 23 2023, 7:30 PM

add decription for cost model and modify the typo.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1510

done

Can you describe the cost model in the patch description?

done, thanks a lot

craig.topper added inline comments.Aug 23 2023, 7:48 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
530

Don't we need to check C or Zcf/Zcd for compressing FP loads and stores? They aren't compressible with just Zca.

534

Just to confirm, the vector passed to this function does not include the emergency spill slot scavenging slots which must be kept to close to sp/fp?

lcvon007 updated this revision to Diff 552963.Aug 23 2023, 7:55 PM

add a { } for if condition, NFC

lcvon007 added inline comments.Aug 23 2023, 8:55 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
534

yes, it has excluded it.

lcvon007 updated this revision to Diff 553124.Aug 24 2023, 7:28 AM

add function to check if a ld/st is compressible and remove
the check for c extension because it may improve the code size
even target doesn't support c extension.

lcvon007 added inline comments.Aug 24 2023, 7:45 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
530

I have added function to check whether lw/ld/flw/fld/sw/sd/fsw/fsd is compressible. thanks very much.

wangpc added inline comments.Aug 25 2023, 12:01 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
557

Do we really need these small inline functions? What about making them branches (manually inlining)?

585

Do we really need IsValid? It's always true I think (same for X86).

wangpc added inline comments.Aug 25 2023, 12:03 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
585

OK, ignore it. SortingObjects is with bigger size than ObjectsToAllocate.

wangpc added inline comments.Aug 25 2023, 12:37 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
568

It seems we can also reduce stack size? I think we can enable it by default, not only for optsize. Performance should be evaluated.

lcvon007 updated this revision to Diff 553824.Aug 27 2023, 7:42 PM

Inline compressible ld/st check function by hand. NFC

lcvon007 added inline comments.Aug 27 2023, 7:57 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
557

done, thanks for you

568

The use number of ld/st is static, and the running result may be different, so it's not sure whether we can improve the performance. I think it cannot always reduce stack size, such as,
object1: 4B, 4B aligned
object2: 4B, 4B aligned
object3: 8B, 8B aligned
the stack size will increase if the object order is object1 object3 object2.

lcvon007 updated this revision to Diff 554138.Aug 28 2023, 7:25 PM

combine check if lw/sw/ld/sd are comprssible and update testcase removing extra mattr 'c' because
reorder has effect even without C extension. NFC

please help review(reivew opinions have been done), thanks very much. @craig.topper @wangpc

wangpc added inline comments.Aug 29 2023, 8:38 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
478

Just a style taste. Can we replace this struct with lambda?

501

We don't need static_cast I think.

547

hasMinSize() means that we only enable this optimization in -Oz, not in -Os. Is this expected?
hasOptSize() is for both -Os and -Oz.

613

This should be tested. Please add a MIR test that uses FP.

615

Please add a LLVM_DEBUG to print the final frame just like AArch64.

lcvon007 updated this revision to Diff 554714.Aug 30 2023, 7:30 AM

Use lamda for sort function and add Debug codes for the result of frame reorder.

lcvon007 added inline comments.Aug 30 2023, 7:34 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
613

reorder-inst-compress.mir has tested in the second function.

lcvon007 added inline comments.Aug 30 2023, 7:40 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
478

done, thanks

547

Oz is expected and I use RISCVMakeCompressible.cpp as a reference, and do you know when we need to enable it in Os? is it that decreasing codesize much but regress the performace very less? @wangpc

lcvon007 added inline comments.Aug 30 2023, 7:43 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
501

The difference is that if we need considering the overflow of uint32 x uint32. I think we may keep it.

615

done, thanks

wangpc added inline comments.Aug 31 2023, 1:02 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
547

Os is an optimization level for both GCC and LLVM, Oz is only for LLVM. For LLVM, Oz means extreme code size optimization, and Os will consider both code size and performance.
As for your patch, I think it can be enabled under Os since it seems that performance won't be impacted(?).

kito-cheng added inline comments.Aug 31 2023, 1:46 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
547
wangpc added inline comments.Aug 31 2023, 2:03 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
547

Oh! Thanks! I didn't know it!

lcvon007 updated this revision to Diff 554997.Aug 31 2023, 5:40 AM

enable opt in Oz and Os.

lcvon007 added inline comments.Aug 31 2023, 5:42 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
547

Enable this opt in Oz/Os now, thanks.

wangpc accepted this revision.Aug 31 2023, 6:39 AM

LGTM, it seems reasonable to me, but please wait for @craig.topper to see if there are more comments.

This revision is now accepted and ready to land.Aug 31 2023, 6:39 AM
reames requested changes to this revision.Aug 31 2023, 9:57 AM
reames added a subscriber: reames.
reames added inline comments.
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
470

The usual structure for this type of thing is to switch over opcode, and put the predicates in the case blocks.

491

I don't think this should be conditional on size optimization. Using compressed loads and stores for frequently accessed objects should improve performance or at least be neutral.

575

I think you're missing something really significant here. If we start with

i32, align 4
i8, align 1

And we reorder to:

i8, align 1
i32, align 4

There's an extra three bytes of padding added between the objects. This lengthens the total offset computation, and may push the second object out of the base+offset addressable range. With a badly chosen set of objects, you can also end up growing the stack very significantly due to all the extra padding.

I would suggest that you start by restricting yourself to reordering objects of the same alignment. (i.e. sort only within sets of object with equal alignment)

585

I think it would be much simpler to remove isValid and have a separate map from frame index to SortedObject index. Please rework this.

1514

This looks like a separate change, and should probably be it's own review with it's own tests.

Also, magic constants are bad. Why can't this be written in terms of StackAlign just like the non-compressed case just below?

This revision now requires changes to proceed.Aug 31 2023, 9:57 AM
lcvon007 updated this revision to Diff 555294.Sep 1 2023, 12:50 AM
  1. Use switch to implement isCompressibleLdOrSt
  2. refine sort alorithm from sorting the whole objects into sorting each group with same alignment size.
  3. use sperate map for frame index mapping.
lcvon007 added inline comments.Sep 1 2023, 1:02 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
575

I adjust the algorithm to sort the objects with same alignment size only(although I know it's not the best order).
I split the following frame objects into four groups rather than three groups,
alignment size of each objects: 1B 4B 4B 1B 4B
group0: 1B
group1: 4B 4B
group2: 1B
group3: 4B
and it will make sure that no extra padding is added.

we may get the smallest frame size if we sort the whole frame objects first, but different groups will have its order first, so it's hard to adjust object in group A before object in group B, or inversely.

lcvon007 added inline comments.Sep 1 2023, 1:05 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1514

remove it, and I will submit another commit for it.

lcvon007 added inline comments.Sep 1 2023, 1:07 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
470

done.

585

Rework yet, please help review. @reames thanks very much.

lcvon007 added inline comments.Sep 1 2023, 1:23 AM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
491

yes, I think it may improve the performance in much scenario, but it may regress the performance because we use static use number of object but not the runtime information. For example,
The original order of frame objects is : A, B (A is in base+ offset addressable range and B is not).
The final order after sorting is: B, A( B is in range but A is not)
and the running number of A is much more than B, so it may regress the performance in theory.

So is it ok to put this optimization under normal compiling mode? @reames

lcvon007 updated this revision to Diff 555790.Sep 4 2023, 7:30 PM

rebase main

lcvon007 added a comment.EditedSep 5 2023, 8:03 PM

hi reames, craig, may you help me review my patch again? review opinions have all been fixed, thanks very much @reames @craig.topper

lcvon007 added a comment.EditedSep 14 2023, 8:45 AM

hi reames, craig, may you help me review my patch again? review opinions have all been fixed, thanks very much @reames @craig.topper

May you help reivew? thanks very much @reames @craig.topper

craig.topper added inline comments.Sep 15 2023, 1:58 PM
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
467

return STI.hasStdExtCOrZca() || STI.hasStdExtZce()

473

return ((STI.hasStdExtC() || STI.hasStdExtZcf() || (STI.hasStdExtZce())) && (STI.getFLen() == 32)

474

C.FLW should still be compressible with FLen==64 I think. It requires the F extension, but the D extension doesn't disable it.

Or was FLen here supposed to be XLen?

480

Why STI.getFLen() <= 64? Shouldn't it be == 64 or >= 64? Though if you see an FLD or FSD then FLEN must be at least 64 so you can probably ignore that check.

return STI.hasStdExtC() || STI.hasStdExtZcd()

486

Make all the cases return and drop this return after the switch.

lcvon007 updated this revision to Diff 556934.Sep 18 2023, 1:49 AM
lcvon007 marked an inline comment as done.

update isCompressibleLdOrSt to fix craig.topper's opinions.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
467

fixed, thanks a a lot.

lcvon007 marked 3 inline comments as done.Sep 18 2023, 1:56 AM
lcvon007 added inline comments.
llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
473

done.

474

Use XLen to replace FLen, thanks very much.

480

Have used XLen <=64 to replace FLen <=64 and the reason I add this condition is that FLD/FSD can be compressed in RV32DC and RV64DC only, please help review. @craig.topper thanks a lot.

486

done.

lcvon007 marked 3 inline comments as done.Sep 21 2023, 5:39 PM

ping

May you help me review again? and Phabricator will be read-only after October 1, thanks very much. @craig.topper
@reames