User Details
- User Since
- Aug 7 2023, 1:19 AM (7 w, 2 d)
Sun, Sep 24
May you help me review again? and Phabricator will be read-only after October 1, thanks very much. @craig.topper
@reames
Thu, Sep 21
ping
Mon, Sep 18
update isCompressibleLdOrSt to fix craig.topper's opinions.
Thu, Sep 14
Tue, Sep 5
hi reames, craig, may you help me review my patch again? review opinions have all been fixed, thanks very much @reames @craig.topper
Mon, Sep 4
rebase main
rebase master
Fri, Sep 1
- Use switch to implement isCompressibleLdOrSt
- refine sort alorithm from sorting the whole objects into sorting each group with same alignment size.
- use sperate map for frame index mapping.
Thu, Aug 31
enable opt in Oz and Os.
Wed, Aug 30
Use lamda for sort function and add Debug codes for the result of frame reorder.
Aug 28 2023
please help review(reivew opinions have been done), thanks very much. @craig.topper @wangpc
combine check if lw/sw/ld/sd are comprssible and update testcase removing extra mattr 'c' because
reorder has effect even without C extension. NFC
Aug 27 2023
Inline compressible ld/st check function by hand. NFC
Aug 24 2023
add function to check if a ld/st is compressible and remove
the check for c extension because it may improve the code size
even target doesn't support c extension.
Aug 23 2023
add a { } for if condition, NFC
add decription for cost model and modify the typo.
the optimization data for code size in Oz(strip the symbol table)
Aug 17 2023
update 2048 into 2047 considering the positive offset of addi.
Aug 16 2023
add more comments in the code and remove the CSI.size check as craig.topper says.
hi @craig.topper , the review opinions you say have been solved yet, may you help check if this commit is ok now?
Aug 15 2023
refine the function name in testcases and add one extra tests for
stacksize that is in (2048 * 2 -StackAlign, 2048 * 2 + 512].
add extra compression condition.
rebase main
rebase main branch
Aug 14 2023
NFC, update testcase to simplify it.
For the program's performance, it has not got obvious profit or regression(I only test them in O2 and run with one copy about spec cpu2006), I also disassemblers each binary and see that they 're almost same except some cases like the follows(no extra instructions generated.): @wangpc
- auipc + jalr => jal(optimized version)
- li a7, 7, slli a7, a7, 0xb => lui a7, 0x4, addiw a7, a7, -528(optimized version)
- addi a0, a0, -104 => mv a0, a0
Add Zca condition and change << into normal multiplication.
Aug 13 2023
submit codes for craig.topper's review opinion
Aug 12 2023
It seems that compiler doesn't generate the best code, and it can generate addi a0, a0, -256 like addi a0, a0, -2032, but it generates li a0, 31, slli a0, a0, 8 => 31 << 8 = 2 * 4096 -256, so I think threre is another optimization point here @wangpc
Aug 10 2023
hi, I have seen some performance degradattion in my first implementation, and I update the logic to adjust the FirstSP amount conservatively so that it will not increase extra instructions, please help review, thanks very much. @wangpc
Update the method to adjust the FirstSP mount to avoid
adding extra insts.
Aug 8 2023
code size before and after optimization about spec CPU2006:
400.perlbench : 1146568, 1145352, decrease 1216
403.gcc: 3556640, 3555280, decrease 1360
445.gobmk: 4437552, 4433280 decrease 4272
- h264ref: 586776, 585304 decrease 1472
483.xalancbmk: 8011104, 8008944 decrease 2160
435.gromacs: 870344, 867832, decrease 2512
444.namd: 212224, 207936, decrease 4288
447.dealII: 4741400, 4736792, decrease 4608
453.povray: 1075832, 1073400, decrease 2432