This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
lib/Target/RISCV/
-
Target/
-
RISCV/
28/28
RISCVFrameLowering.cpp
-
test/CodeGen/RISCV/
-
CodeGen/
-
RISCV/
4/4
stack-inst-compress.mir

Differential D157373

[RISCV] add a compress optimization for stack inst.
ClosedPublic

Authored by lcvon007 on Aug 8 2023, 1:34 AM.

Download Raw Diff

Details

Reviewers

shiva0217
wangpc
asb
craig.topper

Commits

rG13454a6e8744: [RISCV] Compress stack insts by adjust offset.

Summary

For callee save/restored operation, it mostly uses the
following inst patterns:

sw rs2, offset(x2)
sd rs2, offset(x2)
fsw rs2, offset(x2)
fsd rs2, offset(x2)
lw rd, offset(x2)
ld rd, offset(x2)
flw rd, offset(x2)
fld rd, offset(x2)

and offset decides whether the instructions can be compressed.
now offset 2032 will be set by default if stacksize is larger
than 2^12-1 to save and restore callee saved register, so it
will prevent all the callee saved stack insts be compressed.

Allocate proper offset for stack insts is useful to decrease
the codesize and improve performance and add an option
riscv-compress-stack-inst to control whether to do this
optimization.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

lcvon007 created this revision.Aug 8 2023, 1:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 8 2023, 1:34 AM

Herald added subscribers: jobnoorman, luke, VincentWu and 26 others. · View Herald Transcript

lcvon007 requested review of this revision.Aug 8 2023, 1:34 AM

Herald added a project: Restricted Project. · View Herald TranscriptAug 8 2023, 1:34 AM

Herald added subscribers: llvm-commits, wangpc, eopXD, MaskRay. · View Herald Transcript

Harbormaster completed remote builds in B251018: Diff 548102.Aug 8 2023, 3:30 AM

code size before and after optimization about spec CPU2006:
400.perlbench : 1146568, 1145352, decrease 1216
403.gcc: 3556640, 3555280, decrease 1360
445.gobmk: 4437552, 4433280 decrease 4272

h264ref: 586776, 585304 decrease 1472

483.xalancbmk: 8011104, 8008944 decrease 2160
435.gromacs: 870344, 867832, decrease 2512
444.namd: 212224, 207936, decrease 4288
447.dealII: 4741400, 4736792, decrease 4608
453.povray: 1075832, 1073400, decrease 2432

Have you evaluated impact on performance? It seems we will generate more instructions to adjust stack.

In D157373#4568990, @wangpc wrote:

Have you evaluated impact on performance? It seems we will generate more instructions to adjust stack.

It may increase two instructions totally to help building the large immediate in prolog and epilog, and it may see some improvement about 1%-2% in some case and some performance degradation in some cases, and I try to provide some detailed data after more testing.

lcvon007 edited reviewers, added: wangpc; removed: frasercrmck.Aug 8 2023, 8:32 PM

Herald added a subscriber: frasercrmck. · View Herald TranscriptAug 8 2023, 8:32 PM

wangpc added reviewers: asb, craig.topper.Aug 9 2023, 7:29 AM

Update the method to adjust the FirstSP mount to avoid
adding extra insts.

hi, I have seen some performance degradattion in my first implementation, and I update the logic to adjust the FirstSP amount conservatively so that it will not increase extra instructions, please help review, thanks very much. @wangpc

Harbormaster completed remote builds in B251681: Diff 549026.Aug 10 2023, 11:58 AM

In D157373#4576850, @lcvon007 wrote:

hi, I have seen some performance degradattion in my first implementation, and I update the logic to adjust the FirstSP amount conservatively so that it will not increase extra instructions, please help review, thanks very much. @wangpc

Can you show the improvement on code size/performance of your new implementation (though the old one should still be valid)?
LGTM in general and I think it is really nice to have such optimization!

I just found a regression:

--- a/llvm/test/CodeGen/RISCV/stack-realignment.ll
+++ b/llvm/test/CodeGen/RISCV/stack-realignment.ll
@@ -1,7 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
+; RUN: llc -mtriple=riscv32 -mattr=+c -verify-machineinstrs < %s \
 ; RUN:   | FileCheck %s -check-prefix=RV32I
-; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
+; RUN: llc -mtriple=riscv64 -mattr=+c -verify-machineinstrs < %s \
 ; RUN:   | FileCheck %s -check-prefix=RV64I
 
 declare void @callee(ptr)
@@ -529,56 +529,58 @@ define void @caller_no_realign2048() "no-realign-stack" {
 define void @caller4096() {
 ; RV32I-LABEL: caller4096:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -2032
-; RV32I-NEXT:    .cfi_def_cfa_offset 2032
-; RV32I-NEXT:    sw ra, 2028(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    sw s0, 2024(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    addi sp, sp, -256
+; RV32I-NEXT:    .cfi_def_cfa_offset 256
+; RV32I-NEXT:    sw ra, 252(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s0, 248(sp) # 4-byte Folded Spill
 ; RV32I-NEXT:    .cfi_offset ra, -4
 ; RV32I-NEXT:    .cfi_offset s0, -8
-; RV32I-NEXT:    addi s0, sp, 2032
+; RV32I-NEXT:    addi s0, sp, 256
 ; RV32I-NEXT:    .cfi_def_cfa s0, 0
-; RV32I-NEXT:    lui a0, 2
-; RV32I-NEXT:    addi a0, a0, -2032
+; RV32I-NEXT:    li a0, 31
+; RV32I-NEXT:    slli a0, a0, 8
 ; RV32I-NEXT:    sub sp, sp, a0
 ; RV32I-NEXT:    srli a0, sp, 12
 ; RV32I-NEXT:    slli sp, a0, 12
 ; RV32I-NEXT:    lui a0, 1
-; RV32I-NEXT:    add a0, sp, a0
+; RV32I-NEXT:    add a0, a0, sp
 ; RV32I-NEXT:    call callee@plt
 ; RV32I-NEXT:    lui a0, 2
 ; RV32I-NEXT:    sub sp, s0, a0
-; RV32I-NEXT:    addi a0, a0, -2032
+; RV32I-NEXT:    li a0, 31
+; RV32I-NEXT:    slli a0, a0, 8
 ; RV32I-NEXT:    add sp, sp, a0
-; RV32I-NEXT:    lw ra, 2028(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    lw s0, 2024(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 2032
+; RV32I-NEXT:    lw ra, 252(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s0, 248(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    addi sp, sp, 256
 ; RV32I-NEXT:    ret
 ;
 ; RV64I-LABEL: caller4096:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    addi sp, sp, -2032
-; RV64I-NEXT:    .cfi_def_cfa_offset 2032
-; RV64I-NEXT:    sd ra, 2024(sp) # 8-byte Folded Spill
-; RV64I-NEXT:    sd s0, 2016(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    addi sp, sp, -512
+; RV64I-NEXT:    .cfi_def_cfa_offset 512
+; RV64I-NEXT:    sd ra, 504(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    sd s0, 496(sp) # 8-byte Folded Spill
 ; RV64I-NEXT:    .cfi_offset ra, -8
 ; RV64I-NEXT:    .cfi_offset s0, -16
-; RV64I-NEXT:    addi s0, sp, 2032
+; RV64I-NEXT:    addi s0, sp, 512
 ; RV64I-NEXT:    .cfi_def_cfa s0, 0
-; RV64I-NEXT:    lui a0, 2
-; RV64I-NEXT:    addiw a0, a0, -2032
+; RV64I-NEXT:    li a0, 15
+; RV64I-NEXT:    slli a0, a0, 9
 ; RV64I-NEXT:    sub sp, sp, a0
 ; RV64I-NEXT:    srli a0, sp, 12
 ; RV64I-NEXT:    slli sp, a0, 12
 ; RV64I-NEXT:    lui a0, 1
-; RV64I-NEXT:    add a0, sp, a0
+; RV64I-NEXT:    add a0, a0, sp
 ; RV64I-NEXT:    call callee@plt
 ; RV64I-NEXT:    lui a0, 2
 ; RV64I-NEXT:    sub sp, s0, a0
-; RV64I-NEXT:    addiw a0, a0, -2032
+; RV64I-NEXT:    li a0, 15
+; RV64I-NEXT:    slli a0, a0, 9
 ; RV64I-NEXT:    add sp, sp, a0
-; RV64I-NEXT:    ld ra, 2024(sp) # 8-byte Folded Reload
-; RV64I-NEXT:    ld s0, 2016(sp) # 8-byte Folded Reload
-; RV64I-NEXT:    addi sp, sp, 2032
+; RV64I-NEXT:    ld ra, 504(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    ld s0, 496(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    addi sp, sp, 512
 ; RV64I-NEXT:    ret
   %1 = alloca i8, align 4096
   call void @callee(ptr %1)

I think that is because we need to do a larger second stack adjustment if we do a small first stack adjustment.
So the impact on performance should be evalated so that we can decide whether this optimization should be enabled under -Os/-Oz only. :-)

It seems that compiler doesn't generate the best code, and it can generate addi a0, a0, -256 like addi a0, a0, -2032, but it generates li a0, 31, slli a0, a0, 8 => 31 << 8 = 2 * 4096 -256, so I think threre is another optimization point here @wangpc

I find that the reason why my optimization will add one extra instruction is that the difference in building big immediate 8192-2032 and 8192 - 512,
lui a0, 2, addiw a0, a0, -2032 is for 8192-2032
li a0, 15, slli a0, a0, 9 is for 8192-256,
and lui a0, 2 will be optimized later, so if build 8192 - 512 use lui a0, 2, addiw a0, a0, -256, the result will be similiar.

I show the codesize data here firstly and will provide the performace data later:

codesize:

craig.topper added inline comments.Aug 12 2023, 10:40 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1315	Can we compute CompressLen for both RV32 and RV64 as XLen * 8? And then merge the 2 if statements?
1316	c.ldsp?
1321	return 256
1324	return 512
1326	return 2048 - StackAlign

submit codes for craig.topper's review opinion

Harbormaster completed remote builds in B252173: Diff 549689.Aug 13 2023, 3:11 AM

craig.topper added inline comments.Aug 13 2023, 11:40 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1313	What about Zca?
1322	Use `* 8`. Let the compiler convert it to a shift.

In D157373#4579136, @wangpc wrote:

I just found a regression:

--- a/llvm/test/CodeGen/RISCV/stack-realignment.ll
+++ b/llvm/test/CodeGen/RISCV/stack-realignment.ll
@@ -1,7 +1,7 @@
 ; NOTE: Assertions have been autogenerated by utils/update_llc_test_checks.py
-; RUN: llc -mtriple=riscv32 -verify-machineinstrs < %s \
+; RUN: llc -mtriple=riscv32 -mattr=+c -verify-machineinstrs < %s \
 ; RUN:   | FileCheck %s -check-prefix=RV32I
-; RUN: llc -mtriple=riscv64 -verify-machineinstrs < %s \
+; RUN: llc -mtriple=riscv64 -mattr=+c -verify-machineinstrs < %s \
 ; RUN:   | FileCheck %s -check-prefix=RV64I
 
 declare void @callee(ptr)
@@ -529,56 +529,58 @@ define void @caller_no_realign2048() "no-realign-stack" {
 define void @caller4096() {
 ; RV32I-LABEL: caller4096:
 ; RV32I:       # %bb.0:
-; RV32I-NEXT:    addi sp, sp, -2032
-; RV32I-NEXT:    .cfi_def_cfa_offset 2032
-; RV32I-NEXT:    sw ra, 2028(sp) # 4-byte Folded Spill
-; RV32I-NEXT:    sw s0, 2024(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    addi sp, sp, -256
+; RV32I-NEXT:    .cfi_def_cfa_offset 256
+; RV32I-NEXT:    sw ra, 252(sp) # 4-byte Folded Spill
+; RV32I-NEXT:    sw s0, 248(sp) # 4-byte Folded Spill
 ; RV32I-NEXT:    .cfi_offset ra, -4
 ; RV32I-NEXT:    .cfi_offset s0, -8
-; RV32I-NEXT:    addi s0, sp, 2032
+; RV32I-NEXT:    addi s0, sp, 256
 ; RV32I-NEXT:    .cfi_def_cfa s0, 0
-; RV32I-NEXT:    lui a0, 2
-; RV32I-NEXT:    addi a0, a0, -2032
+; RV32I-NEXT:    li a0, 31
+; RV32I-NEXT:    slli a0, a0, 8
 ; RV32I-NEXT:    sub sp, sp, a0
 ; RV32I-NEXT:    srli a0, sp, 12
 ; RV32I-NEXT:    slli sp, a0, 12
 ; RV32I-NEXT:    lui a0, 1
-; RV32I-NEXT:    add a0, sp, a0
+; RV32I-NEXT:    add a0, a0, sp
 ; RV32I-NEXT:    call callee@plt
 ; RV32I-NEXT:    lui a0, 2
 ; RV32I-NEXT:    sub sp, s0, a0
-; RV32I-NEXT:    addi a0, a0, -2032
+; RV32I-NEXT:    li a0, 31
+; RV32I-NEXT:    slli a0, a0, 8
 ; RV32I-NEXT:    add sp, sp, a0
-; RV32I-NEXT:    lw ra, 2028(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    lw s0, 2024(sp) # 4-byte Folded Reload
-; RV32I-NEXT:    addi sp, sp, 2032
+; RV32I-NEXT:    lw ra, 252(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    lw s0, 248(sp) # 4-byte Folded Reload
+; RV32I-NEXT:    addi sp, sp, 256
 ; RV32I-NEXT:    ret
 ;
 ; RV64I-LABEL: caller4096:
 ; RV64I:       # %bb.0:
-; RV64I-NEXT:    addi sp, sp, -2032
-; RV64I-NEXT:    .cfi_def_cfa_offset 2032
-; RV64I-NEXT:    sd ra, 2024(sp) # 8-byte Folded Spill
-; RV64I-NEXT:    sd s0, 2016(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    addi sp, sp, -512
+; RV64I-NEXT:    .cfi_def_cfa_offset 512
+; RV64I-NEXT:    sd ra, 504(sp) # 8-byte Folded Spill
+; RV64I-NEXT:    sd s0, 496(sp) # 8-byte Folded Spill
 ; RV64I-NEXT:    .cfi_offset ra, -8
 ; RV64I-NEXT:    .cfi_offset s0, -16
-; RV64I-NEXT:    addi s0, sp, 2032
+; RV64I-NEXT:    addi s0, sp, 512
 ; RV64I-NEXT:    .cfi_def_cfa s0, 0
-; RV64I-NEXT:    lui a0, 2
-; RV64I-NEXT:    addiw a0, a0, -2032
+; RV64I-NEXT:    li a0, 15
+; RV64I-NEXT:    slli a0, a0, 9
 ; RV64I-NEXT:    sub sp, sp, a0
 ; RV64I-NEXT:    srli a0, sp, 12
 ; RV64I-NEXT:    slli sp, a0, 12
 ; RV64I-NEXT:    lui a0, 1
-; RV64I-NEXT:    add a0, sp, a0
+; RV64I-NEXT:    add a0, a0, sp
 ; RV64I-NEXT:    call callee@plt
 ; RV64I-NEXT:    lui a0, 2
 ; RV64I-NEXT:    sub sp, s0, a0
-; RV64I-NEXT:    addiw a0, a0, -2032
+; RV64I-NEXT:    li a0, 15
+; RV64I-NEXT:    slli a0, a0, 9
 ; RV64I-NEXT:    add sp, sp, a0
-; RV64I-NEXT:    ld ra, 2024(sp) # 8-byte Folded Reload
-; RV64I-NEXT:    ld s0, 2016(sp) # 8-byte Folded Reload
-; RV64I-NEXT:    addi sp, sp, 2032
+; RV64I-NEXT:    ld ra, 504(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    ld s0, 496(sp) # 8-byte Folded Reload
+; RV64I-NEXT:    addi sp, sp, 512
 ; RV64I-NEXT:    ret
   %1 = alloca i8, align 4096
   call void @callee(ptr %1)

It seems that compiler doesn't generate the best code, and it can generate addi a0, a0, -256 like addi a0, a0, -2032, but it generates li a0, 31, slli a0, a0, 8 => 31 << 8 = 2 * 4096 -256, @wangpc

I find that the reason why my optimization will add one extra instruction is that the difference in building large immediate 8192-2032 and 8192 - 512,
lui a0, 2, addiw a0, a0, -2032 is for 8192-2032
li a0, 15, slli a0, a0, 9 is for 8192-256,
and lui a0, 2 will be optimized later, so if build 8192 - 512 use lui a0, 2, addiw a0, a0, -256, the result will be similiar, so this regression may be avoided.

I show the codesize data here firstly and will provide the performace data later:

codesize:

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1315	done, thanks for your nice advice
1316	done, add other related instructions case
1326	done

wangpc added inline comments.Aug 14 2023, 3:04 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1324	Do we really need this condition `(StackSize <= RVCompressLen + 2048 \|\| StackSize > 2048 * 3 - StackAlign)`?

Add Zca condition and change << into normal multiplication.

lcvon007 added a comment.Aug 14 2023, 6:25 AM

This comment was removed by lcvon007.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1324	Do we really need this condition `(StackSize <= RVCompressLen + 2048 \|\| StackSize > 2048 * 3 - StackAlign)`? As you see in the first version of implementation, it will add extra instructions(because the second SP amount may be too larger to use more instructions to build the immediate) if we don't add this condition, and the performance may regress in some cases, and it's better to remove this condition if we only want to optimize the codesize.
1324	Do we really need this condition `(StackSize <= RVCompressLen + 2048 \|\| StackSize > 2048 * 3 - StackAlign)`?

lcvon007 added inline comments.Aug 14 2023, 6:29 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1322	done
1324	Do we really need this condition `(StackSize <= RVCompressLen + 2048 \|\| StackSize > 2048 * 3 - StackAlign)`? As you see in the first version of implementation, it will add extra instructions(because the second SP amount may be too larger to use more instructions to build the immediate) if we don't add this condition, and the performance may regress in some cases, and it's better to remove this condition if we only want to optimize the codesize.

lcvon007 marked an inline comment as done.Aug 14 2023, 6:34 AM

lcvon007 added inline comments.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1313	I agree with you about Zca, and have added it yet.

For the program's performance, it has not got obvious profit or regression(I only test them in O2 and run with one copy about spec cpu2006), I also disassemblers each binary and see that they 're almost same except some cases like the follows(no extra instructions generated.): @wangpc

auipc + jalr => jal(optimized version)
li a7, 7, slli a7, a7, 0xb => lui a7, 0x4, addiw a7, a7, -528(optimized version)
addi a0, a0, -104 => mv a0, a0

Harbormaster completed remote builds in B252329: Diff 549901.Aug 14 2023, 8:18 AM

wangpc added inline comments.Aug 14 2023, 9:17 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1324	If so, then I think we can loose the condition when optimizing for size? What do you think about it?

lcvon007 added inline comments.Aug 14 2023, 4:40 PM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1324	yes, is it ok to add this feature in other commit? I actually have other ideas to decrease the code size too so I can try it together.

LGTM but please let @craig.topper do the final approval.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1324	Is `2048 * 3` an empirical value? I'm sorry that I can't figure out the reason. If we want second adjustment to be larger than 2048 , should it be `2048*2-StackAlign`?
llvm/test/CodeGen/RISCV/stack-inst-compress.mir
14	Nit: the LLVM IR in a MIR test can be just function stub, the function body can be removed.

This revision is now accepted and ready to land.Aug 14 2023, 8:35 PM

lcvon007 added inline comments.Aug 14 2023, 9:03 PM

llvm/test/CodeGen/RISCV/stack-inst-compress.mir
14	Do you mean I only provide only a decalare here? , like: declare dso_local void @_Z18caller_small_stackv(), and the compiler will report error "basic block 'entry' is not defined in the function '_Z18caller_small_stackv'", so does it need other change too or keep the body here as now?

wangpc added inline comments.Aug 14 2023, 9:31 PM

llvm/test/CodeGen/RISCV/stack-inst-compress.mir

--- a/llvm/test/CodeGen/RISCV/stack-inst-compress.mir
+++ b/llvm/test/CodeGen/RISCV/stack-inst-compress.mir
@@ -10,23 +10,13 @@
 --- |
   define dso_local void @_Z18caller_small_stackv() {
   entry:
-    %arr = alloca [517 x i32], align 4
-    call void @llvm.memset.p0.i64(ptr align 4 %arr, i8 0, i64 2068, i1 false)
-    %arraydecay = getelementptr inbounds [517 x i32], ptr %arr, i64 0, i64 0
-    call void @_Z6calleePi(ptr noundef %arraydecay)
     ret void
   }
 
-  declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg)
-
   declare dso_local void @_Z6calleePi(ptr noundef)
 
   define dso_local void @_Z19caller_larger_stackv() {
   entry:
-    %arr = alloca [1536 x i32], align 4
-    call void @llvm.memset.p0.i64(ptr align 4 %arr, i8 0, i64 6144, i1 false)
-    %arraydecay = getelementptr inbounds [1536 x i32], ptr %arr, i64 0, i64 0
-    call void @_Z6calleePi(ptr noundef %arraydecay)
     ret void
   }
 
@@ -40,7 +30,7 @@ frameInfo:
   hasCalls:        true
   localFrameSize:  2068
 stack:
-  - { id: 0, name: arr, size: 2068, alignment: 4, local-offset: -2068 }
+  - { id: 0, size: 2068, alignment: 4, local-offset: -2068 }
   - { id: 1, type: spill-slot, size: 8, alignment: 8 }
 machineFunctionInfo:
   varArgsFrameIndex: 0
@@ -93,7 +83,7 @@ body:             |
     ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
     renamable $x10 = LUI 1
     renamable $x12 = ADDIW killed renamable $x10, -2028
-    renamable $x10 = ADDI %stack.0.arr, 0
+    renamable $x10 = ADDI %stack.0, 0
     SD $x10, %stack.1, 0 :: (store (s64) into %stack.1)
     renamable $x11 = COPY $x0
     PseudoCALL target-flags(riscv-plt) &memset, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit killed $x11, implicit killed $x12, implicit-def $x2, implicit-def $x10
@@ -115,7 +105,7 @@ frameInfo:
   hasCalls:        true
   localFrameSize:  6144
 stack:
-  - { id: 0, name: arr, size: 6144, alignment: 4, local-offset: -6144 }
+  - { id: 0, size: 6144, alignment: 4, local-offset: -6144 }
   - { id: 1, type: spill-slot, size: 8, alignment: 8 }
 machineFunctionInfo:
   varArgsFrameIndex: 0
@@ -184,7 +174,7 @@ body:             |
     ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
     renamable $x10 = ADDI $x0, 3
     renamable $x12 = SLLI killed renamable $x10, 11
-    renamable $x10 = ADDI %stack.0.arr, 0
+    renamable $x10 = ADDI %stack.0, 0
     SD $x10, %stack.1, 0 :: (store (s64) into %stack.1)
     renamable $x11 = COPY $x0
     PseudoCALL target-flags(riscv-plt) &memset, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit killed $x11, implicit killed $x12, implicit-def $x2, implicit-def $x10

And if you are using utils/update_mir_test_checks.py to generate CHECKs, don't remove the output lines. Or you should remove the line 1:

# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2

NFC, update testcase to simplify it.

lcvon007 added inline comments.Aug 14 2023, 10:19 PM

llvm/test/CodeGen/RISCV/stack-inst-compress.mir
14	Done, I use update_mir_test_checks.py and remove some checkers, and I have removed the line 1 as you suggest, thanks very much.

Harbormaster completed remote builds in B252538: Diff 550192.Aug 15 2023, 1:09 AM

rebase main branch

Harbormaster completed remote builds in B252582: Diff 550252.Aug 15 2023, 4:27 AM

rebase main

lcvon007 marked an inline comment as done.Aug 15 2023, 7:34 AM

lcvon007 marked 3 inline comments as done.

lcvon007 added inline comments.Aug 15 2023, 8:30 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1324	[2048, 2048+512]: before opt: addi + addi , after opt: addi + addi (2048 + 512, 2048 * 2 - StackAlign] : before opt: addi + addi, after opt: addi + addi + addi : not good (2048 * 2 -stackAlign, 2048 * 2 + 512] before opt: addi + addi + addi, after opt : addi + addi + addi (2048 * 2 + 512, 2048 * 3 -StackAlign] : before opt: addi + addi + addi, after opt: addi + lui + addi + addi sp, sp, x? (2048 * 3 -StackAlign, +inf): before: addi + lui+ addi + addi +addi sp, sp, x? , after opt: addi + lui+ addi + addi sp, sp, x? it's not good to use '20482 -StackAlign' but I may add extra condition like if (StackSize <= RVCompressLen + 2048 \|\| (StackSize > 2048 2 -stackAlign && StackSize <=2048 * 2 + 512) \|\| StackSize > 2048 * 3 -StackAlign )

add extra compression condition.

Harbormaster completed remote builds in B252660: Diff 550356.Aug 15 2023, 9:51 AM

wangpc added inline comments.Aug 15 2023, 7:58 PM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1324	OK, I get it. Can we add more tests for different stack sizes as you said? And some suggestions on test: 1) MIRs in test can be further reduced (for example , the call to `memset` is really needed?), 2) rename functions to `stack_size_n` where `n` is the stack size.

lcvon007 marked an inline comment as done.Aug 15 2023, 8:55 PM

refine the function name in testcases and add one extra tests for
stacksize that is in (2048 * 2 -StackAlign, 2048 * 2 + 512].

lcvon007 added inline comments.Aug 15 2023, 10:29 PM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1324	I have added the new function in testcases for new added condition; 2. remove the memset and rename function; I have two questions and ask for your help, I find the CI has failed and it's ok in local, and do you know the reason why it fails? I don't have commit access now, and may you help me submit this patch when it's ok? thanks very much for you.

Harbormaster completed remote builds in B252853: Diff 550612.Aug 16 2023, 12:30 AM

hi @craig.topper , the review opinions you say have been solved yet, may you help check if this commit is ok now?

craig.topper added inline comments.Aug 16 2023, 11:56 AM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1304	This comment is about the code after the compressed handling. Should it be moved down?
1324	Isn't CSI.size() the number of registers being saved? RVCompressLen is 256 or 512. Won't CSI.size() always be less than that? They aren't the same units.

add more comments in the code and remove the CSI.size check as craig.topper says.

lcvon007 added inline comments.Aug 16 2023, 7:30 PM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1304	I add 'at most' in the comment but I don't move down them, because 2048-StackAlign is the background for the following codes, so I think it's better to leave them here as before, and I add more comments, is it ok?

lcvon007 marked an inline comment as done.Aug 16 2023, 7:32 PM

lcvon007 added inline comments.Aug 16 2023, 7:35 PM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1324	you're right and they're the same units, and CSI.size is always less than RVCompressLen, I have removed this check , thanks for your advice.

Harbormaster completed remote builds in B253099: Diff 550962.Aug 16 2023, 8:30 PM

craig.topper added inline comments.Aug 16 2023, 11:10 PM

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1324	I think `2048 * 2 - StackAlign` was not addi, addi with the old code. The FirstSPAdjust would be 2048-StackAlign or 2040. That leaves 2048 left, that can't be represented by an ADDI. So I think `2048 * 2 - StackAlign` can use the compressed code? I didn't recheck the other boundaries yet.

I think 2048 * 2 + 512 is not 3 addis with the new code. After adding 512, we can't do the remaining 4096 in two addis. We can only do a maximum of 4094 with 2 addis.

craig.topper requested changes to this revision.Aug 16 2023, 11:13 PM

This revision now requires changes to proceed.Aug 16 2023, 11:13 PM

update 2048 into 2047 considering the positive offset of addi.

In D157373#4594385, @craig.topper wrote:

I think 2048 * 2 + 512 is not 3 addis with the new code. After adding 512, we can't do the remaining 4096 in two addis. We can only do a maximum of 4094 with 2 addis.

yes, I did not consider the positive offset is only 2047, I think we can list the condition like the following(Use 512 of RVCompressLen as example):
the instuctions include prologue and epilogue together.

[2048, 2047+512]: before opt: (addi + addi) x 2 , after opt: (addi + addi) x2: good
2048 + 512: before opt: (addi + addi) x 2, after opt: (addi + addi) + addi + addi + addi : not good
(2048 + 512, 2048 - StackAlign + 2047] : before opt: (addi + addi) x 2, after opt: (addi + addi + addi) x 2 : not good
2048 -StackAlign + 2048 : before opt: (addi + addi) + (addi + addi + addi), after opt: (addi + addi + addi) x2: not good
(2048 -stackAlign + 2048, 2047 * 2 + 512] before opt: (addi + addi + addi) x 2, after opt : (addi + addi + addi) x 2: good
(2047 * 2 + 512, 2048 * 2 + 512]: before opt: (addi + addi + addi) x 2, after opt: (addi + addi + addi) + (addi + LUI + addi + add) not good
(2048 * 2 + 512, 2048 -StackAlign + 2047 * 2] : before opt: (addi + addi + addi) x 2, after opt: (addi + lui + addi + add) x 2: not good
2048 -StackAlign + 2048 + 2047: before opt: (addi + addi + addi) + addi + lui + addi + add, after opt: (addi + lui + addi + add) x 2: not good
2048 - StackAlign + 2048 + 2048: before opt: (addi + addi + addi) + addi + lui + addi + add, after opt: (addi + lui + addi + add) x 2: not good

10: (2048 * 3 -StackAlign, +inf): before: (addi + lui+ addi + addi +add) * 2, after opt: (addi + lui+ addi + add) x 2 : good
so the new conditions need to be:
if (StackSize <= 2047+ RVCompressLen || (StackSize > 2048 * 2- StackAlign && StackSize <= 2047 * 2 + RVCompressLen) || StackSize > 2048 * 3 -StackAlign), and I have adjusted it, please help review(thanks very much).

Harbormaster completed remote builds in B253155: Diff 551045.Aug 17 2023, 2:43 AM

Herald added a subscriber: sunshaoce. · View Herald TranscriptAug 17 2023, 2:43 AM

LGTM

This revision is now accepted and ready to land.Aug 17 2023, 8:03 AM

lcvon007 marked 2 inline comments as done.Aug 17 2023, 6:18 PM

lcvon007 marked an inline comment as done.Aug 17 2023, 6:21 PM

lcvon007 added inline comments.

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp
1324	exclude 2048 * 2 -StackAlign yet

lcvon007 marked 2 inline comments as done.Aug 17 2023, 6:21 PM

Closed by commit rG13454a6e8744: [RISCV] Compress stack insts by adjust offset. (authored by laichunfeng <laichunfeng@tencent.com>, committed by DamonFool). · Explain WhyAug 17 2023, 8:01 PM

This revision was automatically updated to reflect the committed changes.

DamonFool added a commit: rG13454a6e8744: [RISCV] Compress stack insts by adjust offset..

Revision Contents

Path

Size

llvm/

lib/

Target/

RISCV/

RISCVFrameLowering.cpp

18 lines

test/

CodeGen/

RISCV/

stack-inst-compress.mir

204 lines

Diff 549026

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp

Show First 20 Lines • Show All 1,295 Lines • ▼ Show 20 Lines	RISCVFrameLowering::getFirstSPAdjustAmount(const MachineFunction &MF) const {
// registers will be pushed by the save-restore libcalls, so we don't have to		// registers will be pushed by the save-restore libcalls, so we don't have to
// split the SP adjustment in this case.		// split the SP adjustment in this case.
if (RVFI->getLibCallStackSize() \|\| RVFI->getRVPushStackSize())		if (RVFI->getLibCallStackSize() \|\| RVFI->getRVPushStackSize())
return 0;		return 0;

// Return the FirstSPAdjustAmount if the StackSize can not fit in a signed		// Return the FirstSPAdjustAmount if the StackSize can not fit in a signed
// 12-bit and there exists a callee-saved register needing to be pushed.		// 12-bit and there exists a callee-saved register needing to be pushed.
if (!isInt<12>(StackSize) && (CSI.size() > 0)) {		if (!isInt<12>(StackSize) && (CSI.size() > 0)) {
// FirstSPAdjustAmount is chosen as (2048 - StackAlign) because 2048 will		// FirstSPAdjustAmount is chosen as (2048 - StackAlign) because 2048 will
		craig.topperUnsubmitted Done Reply Inline Actions This comment is about the code after the compressed handling. Should it be moved down? craig.topper: This comment is about the code after the compressed handling. Should it be moved down?
		lcvon007AuthorUnsubmitted Done Reply Inline Actions I add 'at most' in the comment but I don't move down them, because 2048-StackAlign is the background for the following codes, so I think it's better to leave them here as before, and I add more comments, is it ok? lcvon007: I add 'at most' in the comment but I don't move down them, because 2048-StackAlign is the…
// cause sp = sp + 2048 in the epilogue to be split into multiple		// cause sp = sp + 2048 in the epilogue to be split into multiple
// instructions. Offsets smaller than 2048 can fit in a single load/store		// instructions. Offsets smaller than 2048 can fit in a single load/store
// instruction, and we have to stick with the stack alignment. 2048 has		// instruction, and we have to stick with the stack alignment. 2048 has
// 16-byte alignment. The stack alignment for RV32 and RV64 is 16 and for		// 16-byte alignment. The stack alignment for RV32 and RV64 is 16 and for
// RV32E it is 4. So (2048 - StackAlign) will satisfy the stack alignment.		// RV32E it is 4. So (2048 - StackAlign) will satisfy the stack alignment.
return 2048 - getStackAlign().value();		const uint64_t StackAlign = getStackAlign().value();
		uint64_t FirstSPAmount = 2048 - StackAlign;
		// Adjust the FirstSP amount to make stack inst be compressed.
		if (STI.hasStdExtC()) {
		craig.topperUnsubmitted Done Reply Inline Actions What about Zca? craig.topper: What about Zca?
		lcvon007AuthorUnsubmitted Done Reply Inline Actions I agree with you about Zca, and have added it yet. lcvon007: I agree with you about Zca, and have added it yet.
		// riscv32: c.lwsp rd, offset[7:2] => 2^(6+2)
		const uint64_t RV32CompressLen = 256;
		craig.topperUnsubmitted Done Reply Inline Actions Can we compute CompressLen for both RV32 and RV64 as XLen * 8? And then merge the 2 if statements? craig.topper: Can we compute CompressLen for both RV32 and RV64 as XLen * 8? And then merge the 2 if…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions done, thanks for your nice advice lcvon007: done, thanks for your nice advice
		// riscv64: c.lwsp rd, offset[8:3] => 2^(6+3)
		craig.topperUnsubmitted Done Reply Inline Actions c.ldsp? craig.topper: c.ldsp?
		lcvon007AuthorUnsubmitted Done Reply Inline Actions done, add other related instructions case lcvon007: done, add other related instructions case
		const uint64_t RV64CompressLen = 512;
		// Avoid increasing extra instructions when inst can be compressed.
		if (STI.getXLen() == 32 && (StackSize <= RV32CompressLen + 2048 \|\|
		StackSize > 2048 * 3 - StackAlign))
		FirstSPAmount = 256;
		craig.topperUnsubmitted Done Reply Inline Actions return 256 craig.topper: return 256
		else if (STI.getXLen() == 64 && (StackSize <= RV64CompressLen + 2048 \|\|
		craig.topperUnsubmitted Done Reply Inline Actions Use `* 8`. Let the compiler convert it to a shift. craig.topper: Use `* 8`. Let the compiler convert it to a shift.
		lcvon007AuthorUnsubmitted Done Reply Inline Actions done lcvon007: done
		StackSize > 2048 * 3 - StackAlign))
		FirstSPAmount = 512;
		craig.topperUnsubmitted Done Reply Inline Actions return 512 craig.topper: return 512
		wangpcUnsubmitted Done Reply Inline Actions Do we really need this condition `(StackSize <= RVCompressLen + 2048 \|\| StackSize > 2048 * 3 - StackAlign)`? wangpc: Do we really need this condition `(StackSize <= RVCompressLen + 2048 \|\| StackSize > 2048 * 3…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions Do we really need this condition `(StackSize <= RVCompressLen + 2048 \|\| StackSize > 2048 * 3 - StackAlign)`? As you see in the first version of implementation, it will add extra instructions(because the second SP amount may be too larger to use more instructions to build the immediate) if we don't add this condition, and the performance may regress in some cases, and it's better to remove this condition if we only want to optimize the codesize. lcvon007: > Do we really need this condition `(StackSize <= RVCompressLen + 2048 \|\| StackSize > 2048 * 3…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions Do we really need this condition `(StackSize <= RVCompressLen + 2048 \|\| StackSize > 2048 * 3 - StackAlign)`? lcvon007: > Do we really need this condition `(StackSize <= RVCompressLen + 2048 \|\| StackSize > 2048 * 3…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions Do we really need this condition `(StackSize <= RVCompressLen + 2048 \|\| StackSize > 2048 * 3 - StackAlign)`? As you see in the first version of implementation, it will add extra instructions(because the second SP amount may be too larger to use more instructions to build the immediate) if we don't add this condition, and the performance may regress in some cases, and it's better to remove this condition if we only want to optimize the codesize. lcvon007: > Do we really need this condition `(StackSize <= RVCompressLen + 2048 \|\| StackSize > 2048 * 3…
		wangpcUnsubmitted Done Reply Inline Actions If so, then I think we can loose the condition when optimizing for size? What do you think about it? wangpc: If so, then I think we can loose the condition when optimizing for size? What do you think…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions yes, is it ok to add this feature in other commit? I actually have other ideas to decrease the code size too so I can try it together. lcvon007: yes, is it ok to add this feature in other commit? I actually have other ideas to decrease the…
		wangpcUnsubmitted Done Reply Inline Actions Is `2048 * 3` an empirical value? I'm sorry that I can't figure out the reason. If we want second adjustment to be larger than 2048 , should it be `20482-StackAlign`? wangpc:* Is `2048 * 3` an empirical value? I'm sorry that I can't figure out the reason. If we want…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions [2048, 2048+512]: before opt: addi + addi , after opt: addi + addi (2048 + 512, 2048 * 2 - StackAlign] : before opt: addi + addi, after opt: addi + addi + addi : not good (2048 * 2 -stackAlign, 2048 * 2 + 512] before opt: addi + addi + addi, after opt : addi + addi + addi (2048 * 2 + 512, 2048 * 3 -StackAlign] : before opt: addi + addi + addi, after opt: addi + lui + addi + addi sp, sp, x? (2048 * 3 -StackAlign, +inf): before: addi + lui+ addi + addi +addi sp, sp, x? , after opt: addi + lui+ addi + addi sp, sp, x? it's not good to use '20482 -StackAlign' but I may add extra condition like if (StackSize <= RVCompressLen + 2048 \|\| (StackSize > 2048 2 -stackAlign && StackSize <=2048 * 2 + 512) \|\| StackSize > 2048 * 3 -StackAlign ) lcvon007: [2048, 2048+512]: before opt: addi + addi , after opt: addi + addi (2048 + 512, 2048 * 2…
		wangpcUnsubmitted Done Reply Inline Actions OK, I get it. Can we add more tests for different stack sizes as you said? And some suggestions on test: 1) MIRs in test can be further reduced (for example , the call to `memset` is really needed?), 2) rename functions to `stack_size_n` where `n` is the stack size. wangpc: OK, I get it. Can we add more tests for different stack sizes as you said? And some suggestions…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions I have added the new function in testcases for new added condition; 2. remove the memset and rename function; I have two questions and ask for your help, I find the CI has failed and it's ok in local, and do you know the reason why it fails? I don't have commit access now, and may you help me submit this patch when it's ok? thanks very much for you. lcvon007: 1. I have added the new function in testcases for new added condition; 2. remove the memset and…
		craig.topperUnsubmitted Done Reply Inline Actions I think `2048 * 2 - StackAlign` was not addi, addi with the old code. The FirstSPAdjust would be 2048-StackAlign or 2040. That leaves 2048 left, that can't be represented by an ADDI. So I think `2048 * 2 - StackAlign` can use the compressed code? I didn't recheck the other boundaries yet. craig.topper: I think `2048 * 2 - StackAlign` was not addi, addi with the old code. The FirstSPAdjust would…
		lcvon007AuthorUnsubmitted Done Reply Inline Actions exclude 2048 * 2 -StackAlign yet lcvon007: exclude 2048 * 2 -StackAlign yet
		craig.topperUnsubmitted Done Reply Inline Actions Isn't CSI.size() the number of registers being saved? RVCompressLen is 256 or 512. Won't CSI.size() always be less than that? They aren't the same units. craig.topper: Isn't CSI.size() the number of registers being saved? RVCompressLen is 256 or 512. Won't CSI.
		lcvon007AuthorUnsubmitted Done Reply Inline Actions you're right and they're the same units, and CSI.size is always less than RVCompressLen, I have removed this check , thanks for your advice. lcvon007: you're right and they're the same units, and CSI.size is always less than RVCompressLen, I have…
		}
		return FirstSPAmount;
		craig.topperUnsubmitted Done Reply Inline Actions return 2048 - StackAlign craig.topper: return 2048 - StackAlign
		lcvon007AuthorUnsubmitted Done Reply Inline Actions done lcvon007: done
}		}
return 0;		return 0;
}		}

bool RISCVFrameLowering::spillCalleeSavedRegisters(		bool RISCVFrameLowering::spillCalleeSavedRegisters(
MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,		MachineBasicBlock &MBB, MachineBasicBlock::iterator MI,
ArrayRef<CalleeSavedInfo> CSI, const TargetRegisterInfo *TRI) const {		ArrayRef<CalleeSavedInfo> CSI, const TargetRegisterInfo *TRI) const {
if (CSI.empty())		if (CSI.empty())
▲ Show 20 Lines • Show All 185 Lines • Show Last 20 Lines

llvm/test/CodeGen/RISCV/stack-inst-compress.mir

This file was added.

				# NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2
				# RUN: llc -march=riscv32 -x mir -run-pass=prologepilog -verify-machineinstrs < %s \
				# RUN: \| FileCheck -check-prefixes=CHECK-RV32-NO-COM %s
				# RUN: llc -march=riscv32 -mattr=+c -x mir -run-pass=prologepilog \
				# RUN: -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK-RV32-COM %s
				# RUN: llc -march=riscv64 -x mir -run-pass=prologepilog -verify-machineinstrs < %s \
				# RUN: \| FileCheck -check-prefixes=CHECK-RV64-NO-COM %s
				# RUN: llc -march=riscv64 -mattr=+c -x mir -run-pass=prologepilog \
				# RUN: -verify-machineinstrs < %s \| FileCheck -check-prefixes=CHECK-RV64-COM %s
				--- \|
				define dso_local void @_Z18caller_small_stackv() {
				entry:
				%arr = alloca [517 x i32], align 4
				call void @llvm.memset.p0.i64(ptr align 4 %arr, i8 0, i64 2068, i1 false)
				wangpcUnsubmitted Done Reply Inline Actions Nit: the LLVM IR in a MIR test can be just function stub, the function body can be removed. wangpc: Nit: the LLVM IR in a MIR test can be just function stub, the function body can be removed.
				lcvon007AuthorUnsubmitted Done Reply Inline Actions Do you mean I only provide only a decalare here? , like: declare dso_local void @_Z18caller_small_stackv(), and the compiler will report error "basic block 'entry' is not defined in the function '_Z18caller_small_stackv'", so does it need other change too or keep the body here as now? lcvon007: Do you mean I only provide only a decalare here? , like: declare dso_local void…
				wangpcUnsubmitted Done Reply Inline Actions --- a/llvm/test/CodeGen/RISCV/stack-inst-compress.mir +++ b/llvm/test/CodeGen/RISCV/stack-inst-compress.mir @@ -10,23 +10,13 @@ --- \| define dso_local void @_Z18caller_small_stackv() { entry: - %arr = alloca [517 x i32], align 4 - call void @llvm.memset.p0.i64(ptr align 4 %arr, i8 0, i64 2068, i1 false) - %arraydecay = getelementptr inbounds [517 x i32], ptr %arr, i64 0, i64 0 - call void @_Z6calleePi(ptr noundef %arraydecay) ret void } - declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) - declare dso_local void @_Z6calleePi(ptr noundef) define dso_local void @_Z19caller_larger_stackv() { entry: - %arr = alloca [1536 x i32], align 4 - call void @llvm.memset.p0.i64(ptr align 4 %arr, i8 0, i64 6144, i1 false) - %arraydecay = getelementptr inbounds [1536 x i32], ptr %arr, i64 0, i64 0 - call void @_Z6calleePi(ptr noundef %arraydecay) ret void } @@ -40,7 +30,7 @@ frameInfo: hasCalls: true localFrameSize: 2068 stack: - - { id: 0, name: arr, size: 2068, alignment: 4, local-offset: -2068 } + - { id: 0, size: 2068, alignment: 4, local-offset: -2068 } - { id: 1, type: spill-slot, size: 8, alignment: 8 } machineFunctionInfo: varArgsFrameIndex: 0 @@ -93,7 +83,7 @@ body: \| ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2 renamable $x10 = LUI 1 renamable $x12 = ADDIW killed renamable $x10, -2028 - renamable $x10 = ADDI %stack.0.arr, 0 + renamable $x10 = ADDI %stack.0, 0 SD $x10, %stack.1, 0 :: (store (s64) into %stack.1) renamable $x11 = COPY $x0 PseudoCALL target-flags(riscv-plt) &memset, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit killed $x11, implicit killed $x12, implicit-def $x2, implicit-def $x10 @@ -115,7 +105,7 @@ frameInfo: hasCalls: true localFrameSize: 6144 stack: - - { id: 0, name: arr, size: 6144, alignment: 4, local-offset: -6144 } + - { id: 0, size: 6144, alignment: 4, local-offset: -6144 } - { id: 1, type: spill-slot, size: 8, alignment: 8 } machineFunctionInfo: varArgsFrameIndex: 0 @@ -184,7 +174,7 @@ body: \| ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2 renamable $x10 = ADDI $x0, 3 renamable $x12 = SLLI killed renamable $x10, 11 - renamable $x10 = ADDI %stack.0.arr, 0 + renamable $x10 = ADDI %stack.0, 0 SD $x10, %stack.1, 0 :: (store (s64) into %stack.1) renamable $x11 = COPY $x0 PseudoCALL target-flags(riscv-plt) &memset, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit killed $x11, implicit killed $x12, implicit-def $x2, implicit-def $x10 And if you are using `utils/update_mir_test_checks.py` to generate CHECKs, don't remove the output lines. Or you should remove the line 1: # NOTE: Assertions have been autogenerated by utils/update_mir_test_checks.py UTC_ARGS: --version 2 wangpc: ``` --- a/llvm/test/CodeGen/RISCV/stack-inst-compress.mir +++ b/llvm/test/CodeGen/RISCV/stack…
				lcvon007AuthorUnsubmitted Done Reply Inline Actions Done, I use update_mir_test_checks.py and remove some checkers, and I have removed the line 1 as you suggest, thanks very much. lcvon007: Done, I use update_mir_test_checks.py and remove some checkers, and I have removed the line 1…
				%arraydecay = getelementptr inbounds [517 x i32], ptr %arr, i64 0, i64 0
				call void @_Z6calleePi(ptr noundef %arraydecay)
				ret void
				}

				declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg)

				declare dso_local void @_Z6calleePi(ptr noundef)

				define dso_local void @_Z19caller_larger_stackv() {
				entry:
				%arr = alloca [1536 x i32], align 4
				call void @llvm.memset.p0.i64(ptr align 4 %arr, i8 0, i64 6144, i1 false)
				%arraydecay = getelementptr inbounds [1536 x i32], ptr %arr, i64 0, i64 0
				call void @_Z6calleePi(ptr noundef %arraydecay)
				ret void
				}

				...
				---
				name: _Z18caller_small_stackv
				alignment: 2
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 8
				hasCalls: true
				localFrameSize: 2068
				stack:
				- { id: 0, name: arr, size: 2068, alignment: 4, local-offset: -2068 }
				- { id: 1, type: spill-slot, size: 8, alignment: 8 }
				machineFunctionInfo:
				varArgsFrameIndex: 0
				varArgsSaveSize: 0
				body: \|
				bb.0.entry:
				; CHECK-RV32-NO-COM-LABEL: name: _Z18caller_small_stackv
				; CHECK-RV32-NO-COM: $x2 = frame-setup ADDI $x2, -2032
				; CHECK-RV32-NO-COM-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 2032
				; CHECK-RV32-NO-COM-NEXT: SW killed $x1, $x2, 2028 :: (store (s32) into %stack.2)
				; CHECK-RV32-NO-COM-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -4
				; CHECK-RV32-NO-COM-NEXT: $x2 = frame-setup ADDI $x2, -64

				; CHECK-RV32-NO-COM: $x2 = frame-destroy ADDI $x2, 64
				; CHECK-RV32-NO-COM-NEXT: $x1 = LW $x2, 2028 :: (load (s32) from %stack.2)
				; CHECK-RV32-NO-COM-NEXT: $x2 = frame-destroy ADDI $x2, 2032
				;
				; CHECK-RV32-COM-LABEL: name: _Z18caller_small_stackv
				; CHECK-RV32-COM: $x2 = frame-setup ADDI $x2, -256
				; CHECK-RV32-COM-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 256
				; CHECK-RV32-COM-NEXT: SW killed $x1, $x2, 252 :: (store (s32) into %stack.2)
				; CHECK-RV32-COM-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -4
				; CHECK-RV32-COM-NEXT: $x2 = frame-setup ADDI $x2, -1840

				; CHECK-RV32-COM: $x2 = frame-destroy ADDI $x2, 1840
				; CHECK-RV32-COM-NEXT: $x1 = LW $x2, 252 :: (load (s32) from %stack.2)
				; CHECK-RV32-COM-NEXT: $x2 = frame-destroy ADDI $x2, 256
				;
				; CHECK-RV64-NO-COM-LABEL: name: _Z18caller_small_stackv
				; CHECK-RV64-NO-COM: $x2 = frame-setup ADDI $x2, -2032
				; CHECK-RV64-NO-COM-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 2032
				; CHECK-RV64-NO-COM-NEXT: SD killed $x1, $x2, 2024 :: (store (s64) into %stack.2)
				; CHECK-RV64-NO-COM-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -8
				; CHECK-RV64-NO-COM-NEXT: $x2 = frame-setup ADDI $x2, -64

				; CHECK-RV64-NO-COM: $x2 = frame-destroy ADDI $x2, 64
				; CHECK-RV64-NO-COM-NEXT: $x1 = LD $x2, 2024 :: (load (s64) from %stack.2)
				; CHECK-RV64-NO-COM-NEXT: $x2 = frame-destroy ADDI $x2, 2032
				;
				; CHECK-RV64-COM-LABEL: name: _Z18caller_small_stackv
				; CHECK-RV64-COM: $x2 = frame-setup ADDI $x2, -512
				; CHECK-RV64-COM-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 512
				; CHECK-RV64-COM-NEXT: SD killed $x1, $x2, 504 :: (store (s64) into %stack.2)
				; CHECK-RV64-COM-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -8
				; CHECK-RV64-COM-NEXT: $x2 = frame-setup ADDI $x2, -1584

				; CHECK-RV64-COM: $x2 = frame-destroy ADDI $x2, 1584
				; CHECK-RV64-COM-NEXT: $x1 = LD $x2, 504 :: (load (s64) from %stack.2)
				; CHECK-RV64-COM-NEXT: $x2 = frame-destroy ADDI $x2, 512
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				renamable $x10 = LUI 1
				renamable $x12 = ADDIW killed renamable $x10, -2028
				renamable $x10 = ADDI %stack.0.arr, 0
				SD $x10, %stack.1, 0 :: (store (s64) into %stack.1)
				renamable $x11 = COPY $x0
				PseudoCALL target-flags(riscv-plt) &memset, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit killed $x11, implicit killed $x12, implicit-def $x2, implicit-def $x10
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				dead renamable $x11 = COPY $x10
				$x10 = LD %stack.1, 0 :: (load (s64) from %stack.1)
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				PseudoCALL target-flags(riscv-call) @_Z6calleePi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				PseudoRET

				...
				---
				name: _Z19caller_larger_stackv
				alignment: 2
				tracksRegLiveness: true
				frameInfo:
				maxAlignment: 8
				hasCalls: true
				localFrameSize: 6144
				stack:
				- { id: 0, name: arr, size: 6144, alignment: 4, local-offset: -6144 }
				- { id: 1, type: spill-slot, size: 8, alignment: 8 }
				machineFunctionInfo:
				varArgsFrameIndex: 0
				varArgsSaveSize: 0
				body: \|
				bb.0.entry:
				; CHECK-RV32-NO-COM-LABEL: name: _Z19caller_larger_stackv
				; CHECK-RV32-NO-COM: $x2 = frame-setup ADDI $x2, -2032
				; CHECK-RV32-NO-COM-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 2032
				; CHECK-RV32-NO-COM-NEXT: SW killed $x1, $x2, 2028 :: (store (s32) into %stack.2)
				; CHECK-RV32-NO-COM-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -4
				; CHECK-RV32-NO-COM-NEXT: $x10 = frame-setup LUI 1
				; CHECK-RV32-NO-COM-NEXT: $x10 = frame-setup ADDI killed $x10, 48
				; CHECK-RV32-NO-COM-NEXT: $x2 = frame-setup SUB $x2, killed $x10

				; CHECK-RV32-NO-COM: $x10 = frame-destroy LUI 1
				; CHECK-RV32-NO-COM-NEXT: $x10 = frame-destroy ADDI killed $x10, 48
				; CHECK-RV32-NO-COM-NEXT: $x2 = frame-destroy ADD $x2, killed $x10
				; CHECK-RV32-NO-COM-NEXT: $x1 = LW $x2, 2028 :: (load (s32) from %stack.2)
				; CHECK-RV32-NO-COM-NEXT: $x2 = frame-destroy ADDI $x2, 2032
				;
				; CHECK-RV32-COM-LABEL: name: _Z19caller_larger_stackv
				; CHECK-RV32-COM: $x2 = frame-setup ADDI $x2, -256
				; CHECK-RV32-COM-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 256
				; CHECK-RV32-COM-NEXT: SW killed $x1, $x2, 252 :: (store (s32) into %stack.2)
				; CHECK-RV32-COM-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -4
				; CHECK-RV32-COM-NEXT: $x10 = frame-setup LUI 1
				; CHECK-RV32-COM-NEXT: $x10 = frame-setup ADDI killed $x10, 1824
				; CHECK-RV32-COM-NEXT: $x2 = frame-setup SUB $x2, killed $x10

				; CHECK-RV32-COM: $x10 = frame-destroy LUI 1
				; CHECK-RV32-COM-NEXT: $x10 = frame-destroy ADDI killed $x10, 1824
				; CHECK-RV32-COM-NEXT: $x2 = frame-destroy ADD $x2, killed $x10
				; CHECK-RV32-COM-NEXT: $x1 = LW $x2, 252 :: (load (s32) from %stack.2)
				; CHECK-RV32-COM-NEXT: $x2 = frame-destroy ADDI $x2, 256
				;
				; CHECK-RV64-NO-COM-LABEL: name: _Z19caller_larger_stackv
				; CHECK-RV64-NO-COM: $x2 = frame-setup ADDI $x2, -2032
				; CHECK-RV64-NO-COM-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 2032
				; CHECK-RV64-NO-COM-NEXT: SD killed $x1, $x2, 2024 :: (store (s64) into %stack.2)
				; CHECK-RV64-NO-COM-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -8
				; CHECK-RV64-NO-COM-NEXT: $x10 = frame-setup LUI 1
				; CHECK-RV64-NO-COM-NEXT: $x10 = frame-setup ADDIW killed $x10, 48
				; CHECK-RV64-NO-COM-NEXT: $x2 = frame-setup SUB $x2, killed $x10

				; CHECK-RV64-NO-COM: $x10 = frame-destroy LUI 1
				; CHECK-RV64-NO-COM-NEXT: $x10 = frame-destroy ADDIW killed $x10, 48
				; CHECK-RV64-NO-COM-NEXT: $x2 = frame-destroy ADD $x2, killed $x10
				; CHECK-RV64-NO-COM-NEXT: $x1 = LD $x2, 2024 :: (load (s64) from %stack.2)
				; CHECK-RV64-NO-COM-NEXT: $x2 = frame-destroy ADDI $x2, 2032
				;
				; CHECK-RV64-COM-LABEL: name: _Z19caller_larger_stackv
				; CHECK-RV64-COM: $x2 = frame-setup ADDI $x2, -512
				; CHECK-RV64-COM-NEXT: frame-setup CFI_INSTRUCTION def_cfa_offset 512
				; CHECK-RV64-COM-NEXT: SD killed $x1, $x2, 504 :: (store (s64) into %stack.2)
				; CHECK-RV64-COM-NEXT: frame-setup CFI_INSTRUCTION offset $x1, -8
				; CHECK-RV64-COM-NEXT: $x10 = frame-setup LUI 1
				; CHECK-RV64-COM-NEXT: $x10 = frame-setup ADDIW killed $x10, 1568
				; CHECK-RV64-COM-NEXT: $x2 = frame-setup SUB $x2, killed $x10

				; CHECK-RV64-COM: $x10 = frame-destroy LUI 1
				; CHECK-RV64-COM-NEXT: $x10 = frame-destroy ADDIW killed $x10, 1568
				; CHECK-RV64-COM-NEXT: $x2 = frame-destroy ADD $x2, killed $x10
				; CHECK-RV64-COM-NEXT: $x1 = LD $x2, 504 :: (load (s64) from %stack.2)
				; CHECK-RV64-COM-NEXT: $x2 = frame-destroy ADDI $x2, 512
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				renamable $x10 = ADDI $x0, 3
				renamable $x12 = SLLI killed renamable $x10, 11
				renamable $x10 = ADDI %stack.0.arr, 0
				SD $x10, %stack.1, 0 :: (store (s64) into %stack.1)
				renamable $x11 = COPY $x0
				PseudoCALL target-flags(riscv-plt) &memset, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit killed $x11, implicit killed $x12, implicit-def $x2, implicit-def $x10
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				dead renamable $x11 = COPY $x10
				$x10 = LD %stack.1, 0 :: (load (s64) from %stack.1)
				ADJCALLSTACKDOWN 0, 0, implicit-def dead $x2, implicit $x2
				PseudoCALL target-flags(riscv-call) @_Z6calleePi, csr_ilp32_lp64, implicit-def dead $x1, implicit killed $x10, implicit-def $x2
				ADJCALLSTACKUP 0, 0, implicit-def dead $x2, implicit $x2
				PseudoRET

				...
				## NOTE: These prefixes are unused and the list is autogenerated. Do not add tests below this line:
				# CHECK-RV32-COM: {{.*}}
				# CHECK-RV32-NO-COM: {{.*}}
				# CHECK-RV64-COM: {{.*}}
				# CHECK-RV64-NO-COM: {{.*}}

This is an archive of the discontinued LLVM Phabricator instance.

[RISCV] add a compress optimization for stack inst.ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 549026

llvm/lib/Target/RISCV/RISCVFrameLowering.cpp

llvm/test/CodeGen/RISCV/stack-inst-compress.mir

[RISCV] add a compress optimization for stack inst.
ClosedPublic