For now, we are placing the constant into TOC and whenever it is accessed, we need addis/addi + load. See:
double X(double Y) { return (Y*1.23 + 4.512)*2.34 + 14.38; }
And this is what we have now:
addis 2, 12, .TOC.-.Lfunc_gep0@ha addi 2, 2, .TOC.-.Lfunc_gep0@l .Lfunc_lep0: .localentry X, .Lfunc_lep0-.Lfunc_gep0 # %bb.0: # %entry addis 3, 2, .LCPI0_0@toc@ha lfd 0, .LCPI0_0@toc@l(3) #<-- addi is folding into lfd addis 3, 2, .LCPI0_1@toc@ha xsmuldp 0, 1, 0 lfd 1, .LCPI0_1@toc@l(3) addis 3, 2, .LCPI0_2@toc@ha xsadddp 0, 0, 1 lfd 1, .LCPI0_2@toc@l(3) addis 3, 2, .LCPI0_3@toc@ha xsmuldp 0, 0, 1 lfd 1, .LCPI0_3@toc@l(3) xsadddp 1, 0, 1 blr
It can be optimized as grouping all the constants together into RO data section, so that their relative positions are fixed. Then, create a symbol in TOC which point to that data section. The benefit for this optimization is to reduce the GOT size and improve the performance as the addis is saved. It works like this:
.section .data.rel.ro,"aw",@progbits .p2align 3 # -- Begin function X .LCPI0_0: .quad 0x402cc28f5c28f5c3 # double 14.380000000000001 .quad 0x4002b851eb851eb8 # double 2.3399999999999999 .quad 0x40120c49ba5e353f # double 4.5119999999999996 .quad 0x3ff3ae147ae147ae # double 1.23 .Lfunc_gep0: addis 2, 12, .TOC.-.Lfunc_gep0@ha addi 2, 2, .TOC.-.Lfunc_gep0@l .Lfunc_lep0: .localentry X, .Lfunc_lep0-.Lfunc_gep0 # %bb.0: # %entry addis 3, 2, .LC0@toc@ha ld 3, .LC0@toc@l(3) lfd 0, 24(3) xsmuldp 0, 1, 0 lfd 1, 16(3) xsadddp 0, 0, 1 lfd 1, 8(3) xsmuldp 0, 0, 1 lfdx 1, 0, 3 xsadddp 1, 0, 1 blr .LC0: .tc .LCPI0_0[TC],.LCPI0_0
This optimization has been discussed before. See PowerPC/README.txt for more information.
Lump the constant pool for each function into ONE pic object, and reference pieces of it as offsets from the start. For functions like this (contrived to have lots of constants obviously): double X(double Y) { return (Y*1.23 + 4.512)*2.34 + 14.38; } We generate: _X: lis r2, ha16(.CPI_X_0) lfd f0, lo16(.CPI_X_0)(r2) lis r2, ha16(.CPI_X_1) lfd f2, lo16(.CPI_X_1)(r2) fmadd f0, f1, f0, f2 lis r2, ha16(.CPI_X_2) lfd f1, lo16(.CPI_X_2)(r2) lis r2, ha16(.CPI_X_3) lfd f2, lo16(.CPI_X_3)(r2) fmadd f1, f0, f1, f2 blr It would be better to materialize .CPI_X into a register, then use immediates off of the register to avoid the lis's. This is even more important in PIC mode. Note that this (and the static variable version) is discussed here for GCC: http://gcc.gnu.org/ml/gcc-patches/2006-02/msg00133.html Here's another example (the sgn function): double testf(double a) { return a == 0.0 ? 0.0 : (a > 0.0 ? 1.0 : -1.0); } it produces a BB like this: LBB1_1: ; cond_true lis r2, ha16(LCPI1_0) lfs f0, lo16(LCPI1_0)(r2) lis r2, ha16(LCPI1_1) lis r3, ha16(LCPI1_2) lfs f2, lo16(LCPI1_2)(r3) lfs f3, lo16(LCPI1_1)(r2) fsub f0, f0, f1 fsel f1, f0, f2, f3 blr
Some limitation:
- If there is only one constant, we will have one extra load with this patch. But the load could be optimized by linker if it merges the TOC. It is not easy inside compiler to handle it as ISEL is done basing on per BB, and we don't know if there are other constants until other BBs are selected. Any thoughts ?
- Lump the constant with the same type. Technical speaking, all the constants could be lumped together as far as the alignment is handle carefully.
Can this also support AIX?